{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T00:56:55Z","timestamp":1774400215496,"version":"3.50.1"},"reference-count":222,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,10,6]],"date-time":"2023-10-06T00:00:00Z","timestamp":1696550400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>\n            The recent spread of Deep Learning-based solutions for Artificial Intelligence and the development of Large Language Models has pushed forwards significantly the Natural Language Processing area. The approach has quickly evolved in the last ten years, deeply affecting NLP, from low-level text pre-processing tasks \u2013such as tokenisation or POS tagging\u2013 to high-level, complex NLP applications like machine translation and chatbots. This article examines recent trends in the development of open-domain data-driven generative chatbots, focusing on the\n            <jats:sc>Seq2Seq<\/jats:sc>\n            architectures. Such architectures are compatible with multiple learning approaches, ranging from supervised to reinforcement and, in the last years, allowed to realise very engaging open-domain chatbots. Not only do these architectures allow to directly output the next turn in a conversation but, to some extent, they also allow to control the style or content of the response. To offer a complete view on the subject, we examine possible architecture implementations as well as training and evaluation approaches. Additionally, we provide information about the openly available corpora to train and evaluate such models and about the current and past chatbot competitions. Finally, we present some insights on possible future directions, given the current research status.\n          <\/jats:p>","DOI":"10.1145\/3604281","type":"journal-article","created":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T11:58:20Z","timestamp":1686311900000},"page":"1-58","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["A Primer on\n            <scp>Seq2Seq<\/scp>\n            Models for Generative Chatbots"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8765-604X","authenticated-orcid":false,"given":"Vincenzo","family":"Scotti","sequence":"first","affiliation":[{"name":"DEIB, Politecnico di Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5344-5976","authenticated-orcid":false,"given":"Licia","family":"Sbattella","sequence":"additional","affiliation":[{"name":"DEIB, Politecnico di Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2830-4247","authenticated-orcid":false,"given":"Roberto","family":"Tedesco","sequence":"additional","affiliation":[{"name":"DEIB, Politecnico di Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,10,6]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"Daniel Adiwardana Minh-Thang Luong David R. So Jamie Hall Noah Fiedel Romal Thoppilan Zi Yang Apoorv Kulshreshtha Gaurav Nemade Yifeng Lu and Quoc V. Le. 2020. Towards a human-like open-domain chatbot. arXiv:2001.09977. Retrieved from https:\/\/arxiv.org\/abs\/2001.09977."},{"key":"e_1_3_3_3_2","series-title":"Pattern Recognition. ICPR Int. Workshops and Challenges - Virtual Event, January 10\u201315, 2021, Proceedings, Part II.","first-page":"129","volume":"12662","author":"Agnihotri Manish","year":"2020","unstructured":"Manish Agnihotri, Pooja Rao S. B., Dinesh Babu Jayagopi, Sushranth Hebbar, Sowmya Rasipuram, Anutosh Maitra, and Shubhashis Sengupta. 2020. Towards generating topic-driven and affective responses to assist mental wellness. In Pattern Recognition. ICPR Int. Workshops and Challenges - Virtual Event, January 10\u201315, 2021, Proceedings, Part II.Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, and Roberto Vezzani (Eds.), Lecture Notes in Computer Science, Vol. 12662, Springer, 129\u2013143."},{"key":"e_1_3_3_4_2","unstructured":"Mohammad Aliannejadi Julia Kiseleva Aleksandr Chuklin Jeff Dalton and Mikhail S. Burtsev. 2020. ConvAI3: Generating clarifying questions for open-domain dialogue systems (ClariQ). arXiv:2009.11352. Retrieved from https:\/\/arxiv.org\/abs\/2009.11352."},{"key":"e_1_3_3_5_2","unstructured":"James Allen and Mark Core. 1997. Draft of DAMSL: Dialog act markup in several layers. https:\/\/www.cs.rochester.edu\/research\/cisd\/resources\/damsl\/RevisedManual\/."},{"key":"e_1_3_3_6_2","unstructured":"Sanjeev Arora Yingyu Liang and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of the 5th International Conference on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_7_2","unstructured":"Lei Jimmy Ba Jamie Ryan Kiros and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https:\/\/arxiv.org\/abs\/1607.06450."},{"key":"e_1_3_3_8_2","volume-title":"Proceedings of the 3rd Int. Conf. on Learning Representations.","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd Int. Conf. on Learning Representations.Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_3_9_2","first-page":"65","volume-title":"Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization@ACL 2005.","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization@ACL 2005.Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.), Association for Computational Linguistics, 65\u201372."},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.9"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.222"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1165"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1023"},{"key":"e_1_3_3_14_2","first-page":"932","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 13","author":"Bengio Yoshua","year":"2000","unstructured":"Yoshua Bengio, R\u00e9jean Ducharme, and Pascal Vincent. 2000. A neural probabilistic language model. In Proceedings of the Advances in Neural Information Processing Systems 13. Todd K. Leen, Thomas G. Dietterich, and Volker Tresp (Eds.), MIT Press, 932\u2013938."},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944966"},{"key":"e_1_3_3_16_2","doi-asserted-by":"crossref","unstructured":"Yoshua Bengio J\u00e9r\u00f4me Louradour Ronan Collobert and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning. Andrea Pohoreckyj Danyluk L\u00e9on Bottou and Michael L. Littman (Eds.) ACM 41\u201348.","DOI":"10.1145\/1553374.1553380"},{"key":"e_1_3_3_17_2","first-page":"137","volume-title":"Neural Probabilistic Language Models","author":"Bengio Yoshua","year":"2006","unstructured":"Yoshua Bengio, Holger Schwenk, Jean-S\u00e9bastien Sen\u00e9cal, Fr\u00e9deric Morin, and Jean-Luc Gauvain. 2006. Neural Probabilistic Language Models. Springer, Berlin, 137\u2013186."},{"key":"e_1_3_3_18_2","unstructured":"Nicolas Bertagnolli. 2020. Counsel Chat: Bootstrapping High-Quality Therapy Data. Retrieved from https:\/\/towardsdatascience.com\/counsel-chat-bootstrapping-high-quality-therapy-data-971b419f33da."},{"key":"e_1_3_3_19_2","volume-title":"Pattern Recognition and Machine Learning, 5th Edition","author":"Bishop Christopher M.","year":"2007","unstructured":"Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning, 5th Edition. Springer. Retrieved from https:\/\/www.worldcat.org\/oclc\/71008143."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_3_21_2","unstructured":"Tolga Bolukbasi Kai-Wei Chang James Y. Zou Venkatesh Saligrama and Adam Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. arXiv:1607.06520.Retrieved from http:\/\/arxiv.org\/abs\/1607.06520."},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K16-1002"},{"key":"e_1_3_3_23_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conf. on Neural Information Processing Systems 2020.","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conf. on Neural Information Processing Systems 2020.Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.)."},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-5602"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1547"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-94042-7_2"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-008-9076-6"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.70"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.3384\/ecp190003"},{"key":"e_1_3_3_30_2","doi-asserted-by":"crossref","unstructured":"Andrew Caines Helen Yannakoudakis Helena Edmondson Helen Allen Pascual P\u00e9rez-Paredes Bill Byrne and Paula Buttery. 2020. The teacher-student chatroom corpus. arXiv:2011.07109. Retrieved from https:\/\/arxiv.org\/abs\/2011.07109.","DOI":"10.33774\/coe-2020-7thpv"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.35111\/d37s-c536"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.35111\/exq3-x930"},{"key":"e_1_3_3_33_2","unstructured":"Rollo Carpenter. 1997. Cleverbot. https:\/\/www.cleverbot.com\/app."},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.239"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_3_36_2","unstructured":"Junyoung Chung \u00c7aglar G\u00fcl\u00e7ehre KyungHyun Cho and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555. Retrieved from https:\/\/arxiv.org\/abs\/1412.3555."},{"key":"e_1_3_3_37_2","volume-title":"ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.","author":"Collobert Ronan","year":"2007","unstructured":"Ronan Collobert and Jason Weston. 2007. Fast semantic extraction using a novel neural network architecture. In ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.John A. Carroll, Antal van den Bosch, and Annie Zaenen (Eds.), The Association for Computational Linguistics."},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390177"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078186"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/s0167-6393(02)00071-7"},{"key":"e_1_3_3_41_2","first-page":"76","volume-title":"Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics.","author":"Danescu-Niculescu-Mizil Cristian","year":"2011","unstructured":"Cristian Danescu-Niculescu-Mizil and Lillian Lee. 2011. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics.Frank Keller and David Reitter (Eds.), Association for Computational Linguistics, 76\u201387."},{"key":"e_1_3_3_42_2","volume-title":"Proceedings of the 8th Int. Conf. on Learning Representations","author":"Dathathri Sumanth","year":"2020","unstructured":"Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and play language models: A simple approach to controlled text generation. In Proceedings of the 8th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_43_2","unstructured":"Yann N. Dauphin Harm de Vries Junyoung Chung and Yoshua Bengio. 2015. RMSProp and equilibrated adaptive learning rates for non-convex optimization. arXiv:1502.04390.Retrieved from http:\/\/arxiv.org\/abs\/1502.04390."},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_3_3_45_2","unstructured":"Scott C. Deerwester Susan T. Dumais George W. Furnas Richard A. Harshman Thomas K. Landauer Karen E. Lochbaum and Lynn A. Streeter. 1989. Computer Information Retrieval using Latent Semantic Structure. https:\/\/patents.google.com\/patent\/US4839853A\/en."},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.122"},{"key":"e_1_3_3_47_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2020.101068"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1162\/089120104773633402"},{"key":"e_1_3_3_50_2","doi-asserted-by":"crossref","unstructured":"Emily Dinan Varvara Logacheva Valentin Malykh Alexander H. Miller Kurt Shuster Jack Urbanek Douwe Kiela Arthur Szlam Iulian Serban Ryan Lowe Shrimai Prabhumoye Alan W. Black Alexander I. Rudnicky Jason Williams Joelle Pineau Mikhail S. Burtsev and Jason Weston. 2019. The second conversational intelligence challenge (ConvAI2). arXiv:1902.00098. Retrieved from https:\/\/arxiv.org\/abs\/1902.00098.","DOI":"10.1007\/978-3-030-29135-8_7"},{"key":"e_1_3_3_51_2","volume-title":"Proceedings of the 7th Int. Conf. on Learning Representations","author":"Dinan Emily","year":"2019","unstructured":"Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. Wizard of wikipedia: Knowledge-powered conversational agents. In Proceedings of the 7th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.98"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/57167.57214"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9280.1992.tb00253.x"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog1402_1"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1346"},{"key":"e_1_3_3_57_2","first-page":"240","volume-title":"Proceedings of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.","author":"Fu Hao","year":"2019","unstructured":"Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, and Lawrence Carin. 2019. Cyclical annealing schedule: A simple approach to mitigating KL vanishing. In Proceedings of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), Association for Computational Linguistics, 240\u2013250."},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000074"},{"key":"e_1_3_3_59_2","doi-asserted-by":"crossref","unstructured":"Sarik Ghazarian Johnny Tian-Zheng Wei Aram Galstyan and Nanyun Peng. 2019. Better automatic evaluation of open-domain dialogue systems with contextualized embeddings. arXiv:1904.10635. Retrieved from https:\/\/arxiv.org\/abs\/1904.10635.","DOI":"10.18653\/v1\/W19-2310"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5006"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1992.225858"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-1061"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10710-017-9314-z"},{"key":"e_1_3_3_64_2","unstructured":"Google. 2023. Introducing Bard. Retrieved from https:\/\/bard.google.com."},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2019-3079"},{"key":"e_1_3_3_66_2","unstructured":"Jiatao Gu James Bradbury Caiming Xiong Victor O. K. Li and Richard Socher. 2017. Non-autoregressive neural machine translation. arXiv:1711.02281. Retrieved from https:\/\/arxiv.org\/abs\/1711.02281."},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i14.17527"},{"key":"e_1_3_3_68_2","unstructured":"R. Chulaka Gunasekara Seokhwan Kim Luis Fernando D\u2019Haro Abhinav Rastogi Yun-Nung Chen Mihail Eric Behnam Hedayatnia Karthik Gopalakrishnan Yang Liu Chao-Wei Huang Dilek Hakkani-T\u00fcr Jinchao Li Qi Zhu Lingxiao Luo Lars Liden Kaili Huang Shahin Shayandeh Runze Liang Baolin Peng Zheng Zhang Swadheen Shukla Minlie Huang Jianfeng Gao Shikib Mehri Yulan Feng Carla Gordon Seyed Hossein Alavi David R. Traum Maxine Esk\u00e9nazi Ahmad Beirami Eunjoon Cho Paul A. Crook Ankita De Alborz Geramifard Satwik Kottur Seungwhan Moon Shivani Poddar and Rajen Subba. 2020. Overview of the ninth dialog system technology challenge: DSTC9. arXiv:2011.06486. Retrieved from https:\/\/arxiv.org\/abs\/2011.06486."},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_70_2","volume-title":"Proceedings of the 9th Int. Conf. on Learning Representations","author":"He Pengcheng","year":"2021","unstructured":"Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2021. Deberta: Decoding-enhanced bert with disentangled attention. In Proceedings of the 9th Int. Conf. on Learning Representations. OpenReview.net. https:\/\/openreview.net\/forum?id=XPZIaotutsD."},{"key":"e_1_3_3_71_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_3_72_2","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de Las Casas Lisa Anne Hendricks Johannes Welbl Aidan Clark Tom Hennigan Eric Noland Katie Millican George van den Driessche Bogdan Damoc Aurelia Guy Simon Osindero Karen Simonyan Erich Elsen Jack W. Rae Oriol Vinyals and Laurent Sifre. 2022. Training compute-optimal large language models. arXiv:2203.15556. Retrieved from https:\/\/arxiv.org\/abs\/2203.15556."},{"key":"e_1_3_3_73_2","volume-title":"Proceedings of the 8th Int. Conf. on Learning Representations","author":"Holtzman Ari","year":"2020","unstructured":"Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The curious case of neural text degeneration. In Proceedings of the 8th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_74_2","unstructured":"Chiori Hori and Takaaki Hori. 2017. End-to-end conversation modeling track in DSTC6. arXiv:1706.07440. Retrieved from https:\/\/arxiv.org\/abs\/1706.07440."},{"key":"e_1_3_3_75_2","first-page":"2790","volume-title":"Proceedings of the 36th Int. Conf. on Machine Learning.","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th Int. Conf. on Machine Learning.Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 2790\u20132799."},{"key":"e_1_3_3_76_2","volume-title":"Proceedings of the 11th Int. Conf. on Language Resources and Evaluation","author":"Hsu Chao-Chun","year":"2018","unstructured":"Chao-Chun Hsu, Sheng-Yeh Chen, Chuan-Chun Kuo, Ting-Hao Huang, and Lun-Wei Ku. 2018. EmotionLines: An emotion corpus of multi-party conversations. In Proceedings of the 11th Int. Conf. on Language Resources and Evaluation. European Language Resources Association (ELRA), Miyazaki, Japan."},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383123"},{"key":"e_1_3_3_78_2","volume-title":"Proceedings of the 8th Int. Conf. on Learning Representations","author":"Humeau Samuel","year":"2020","unstructured":"Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In Proceedings of the 8th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_79_2","volume-title":"Proceedings of the 5th Int. Conf. on Learning Representations","author":"Inan Hakan","year":"2017","unstructured":"Hakan Inan, Khashayar Khosravi, and Richard Socher. 2017. Tying word vectors and word classifiers: A loss framework for language modeling. In Proceedings of the 5th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-4115(97)80111-2"},{"key":"e_1_3_3_81_2","unstructured":"Dan Jurafsky and James H. Martin. 2022. Speech and language processing: An Introduction to natural language processing computational linguistics and speech recognition (3rd ed.). (2022). Draft. https:\/\/web.stanford.edu\/jurafsky\/slp3\/."},{"key":"e_1_3_3_82_2","first-page":"2395","volume-title":"Proceedings of the 35th International Conference on Machine Learning.","author":"Kaiser Lukasz","year":"2018","unstructured":"Lukasz Kaiser, Samy Bengio, Aurko Roy, Ashish Vaswani, Niki Parmar, Jakob Uszkoreit, and Noam Shazeer. 2018. Fast decoding in sequence models using discrete latent variables. In Proceedings of the 35th International Conference on Machine Learning.Jennifer G. Dy and Andreas Krause (Eds.), PMLR, 2395\u20132404."},{"key":"e_1_3_3_83_2","first-page":"462","volume-title":"Proceedings of the SIGDIAL 2013 Conf., The 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue","author":"Kim Daejoong","year":"2013","unstructured":"Daejoong Kim, Jaedeug Choi, Kee-Eung Kim, Jungsu Lee, and Jinho Sohn. 2013. Engineering statistical dialog state trackers: A case study on DSTC. In Proceedings of the SIGDIAL 2013 Conf., The 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue. The Association for Computer Linguistics, 462\u2013466."},{"key":"e_1_3_3_84_2","volume-title":"Proceedings of the 3rd Int. Conf. on Learning Representations","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd Int. Conf. on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.), arXiv:1412.6980. Retrieved from http:\/\/arxiv.org\/abs\/1412.6980."},{"key":"e_1_3_3_85_2","volume-title":"Proceedings of the 2nd Int. Conf. on Learning Representations.","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd Int. Conf. on Learning Representations.Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_3_86_2","first-page":"3294","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conf. on Neural Information Processing Systems 2015.","author":"Kiros Ryan","year":"2015","unstructured":"Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conf. on Neural Information Processing Systems 2015.Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.), 3294\u20133302."},{"key":"e_1_3_3_87_2","unstructured":"Mojtaba Komeili Kurt Shuster and Jason Weston. 2021. Internet-augmented dialogue generation. arXiv:2107.07566. Retrieved from https:\/\/arxiv.org\/abs\/2107.07566."},{"key":"e_1_3_3_88_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2012"},{"key":"e_1_3_3_89_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.404"},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1149"},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1014"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1094"},{"key":"e_1_3_3_94_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1127"},{"key":"e_1_3_3_95_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.428"},{"key":"e_1_3_3_96_2","first-page":"986","volume-title":"Proceedings of the 8th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers)","author":"Li Yanran","year":"2017","unstructured":"Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. 2017. DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the 8th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, 986\u2013995."},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1336"},{"key":"e_1_3_3_98_2","first-page":"74","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74\u201381."},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1012"},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i09.7098"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1230"},{"key":"e_1_3_3_102_2","unstructured":"Qi Liu Matt J. Kusner and Phil Blunsom. 2020. A survey on contextual embeddings. arXiv:2003.07278. Retrieved from https:\/\/arxiv.org\/abs\/2003.07278."},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00390"},{"key":"e_1_3_3_104_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692.Retrieved from http:\/\/arxiv.org\/abs\/1907.11692."},{"key":"e_1_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1103"},{"key":"e_1_3_3_106_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-4640"},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.219"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3498509"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02478259"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2010.5583006"},{"key":"e_1_3_3_111_2","volume-title":"Proceedings of the 6th Int. Conf. on Learning Representations","author":"Micikevicius Paulius","year":"2018","unstructured":"Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David Garc\u00eda, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed precision training. In Proceedings of the 6th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_112_2","unstructured":"Microsoft. 2023. The New Bing. Retrieved from https:\/\/www.bing.com\/new."},{"key":"e_1_3_3_113_2","volume-title":"Proceedings of the 1st Int. Conf. on Learning Representations.","author":"Mikolov Tom\u00e1s","year":"2013","unstructured":"Tom\u00e1s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st Int. Conf. on Learning Representations.Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_3_114_2","first-page":"3111","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conf. on Neural Information Processing Systems 2013.","author":"Mikolov Tom\u00e1s","year":"2013","unstructured":"Tom\u00e1s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conf. on Neural Information Processing Systems 2013.Christopher J. C. Burges, L\u00e9on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.), 3111\u20133119."},{"key":"e_1_3_3_115_2","first-page":"462","volume-title":"Proceedings of the 8th Int. Joint Conf. on Natural Language Processing.","author":"Mostafazadeh Nasrin","year":"2017","unstructured":"Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, and Lucy Vanderwende. 2017. Image-grounded conversations: Multimodal context for natural question and response generation. In Proceedings of the 8th Int. Joint Conf. on Natural Language Processing.Greg Kondrak and Taro Watanabe (Eds.), Asian Federation of Natural Language Processing, 462\u2013472."},{"key":"e_1_3_3_116_2","unstructured":"Multiple authors. 2013. Counseling and Psychotherapy Transcripts: Volume II. Retrieved from https:\/\/search.alexanderstreet.com\/ctrn."},{"key":"e_1_3_3_117_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.599"},{"key":"e_1_3_3_118_2","unstructured":"Huyen T. M. Nguyen and David Morales. 2017. A Neural Chatbot with Personality. (2017). https:\/\/web.stanford.edu\/class\/archive\/cs\/cs224n\/cs224n.1174\/reports\/2761115.pdf."},{"key":"e_1_3_3_119_2","unstructured":"OpenAI. 2022. Introducing ChatGPT. Retrieved from https:\/\/openai.com\/blog\/chatgpt\/."},{"key":"e_1_3_3_120_2","unstructured":"OpenAI. 2023. ChatGPT Plugins. Retrieved from https:\/\/openai.com\/blog\/chatgpt-plugins."},{"key":"e_1_3_3_121_2","unstructured":"OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774."},{"key":"e_1_3_3_122_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1049"},{"key":"e_1_3_3_123_2","doi-asserted-by":"publisher","DOI":"10.1145\/2912150"},{"key":"e_1_3_3_124_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311\u2013318."},{"key":"e_1_3_3_125_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.insights-1.5"},{"key":"e_1_3_3_126_2","first-page":"1310","volume-title":"Proceedings of the 30th Int. Conf. on Machine Learning.","author":"Pascanu Razvan","year":"2013","unstructured":"Razvan Pascanu, Tom\u00e1s Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th Int. Conf. on Machine Learning.JMLR.org, 1310\u20131318."},{"key":"e_1_3_3_127_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_3_128_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n18-1202"},{"key":"e_1_3_3_129_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.200"},{"key":"e_1_3_3_130_2","doi-asserted-by":"publisher","DOI":"10.5555\/265013"},{"key":"e_1_3_3_131_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1050"},{"key":"e_1_3_3_132_2","doi-asserted-by":"publisher","DOI":"10.5555\/1603899.1603947"},{"key":"e_1_3_3_133_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2025"},{"key":"e_1_3_3_134_2","unstructured":"Markus N. Rabe and Charles Staats. 2021. Self-attention does not need O(n \\({}^{\\mbox{2}}\\) ) memory. arXiv:2112.05682. Retrieved from https:\/\/arxiv.org\/abs\/2112.05682."},{"issue":"11","key":"e_1_3_3_135_2","first-page":"12","article-title":"Improving language understanding by generative pre-training","volume":"1","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI Blog 1, 11 (2018), 12.","journal-title":"OpenAI Blog"},{"issue":"8","key":"e_1_3_3_136_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_3_137_2","unstructured":"Jack W. Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann H. Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young Eliza Rutherford Tom Hennigan Jacob Menick Albin Cassirer Richard Powell George van den Driessche Lisa Anne Hendricks Maribeth Rauh Po-Sen Huang Amelia Glaese Johannes Welbl Sumanth Dathathri Saffron Huang Jonathan Uesato John Mellor Irina Higgins Antonia Creswell Nat McAleese Amy Wu Erich Elsen Siddhant M. Jayakumar Elena Buchatskaya David Budden Esme Sutherland Karen Simonyan Michela Paganini Laurent Sifre Lena Martens Xiang Lorraine Li Adhiguna Kuncoro Aida Nematzadeh Elena Gribovskaya Domenic Donato Angeliki Lazaridou Arthur Mensch Jean-Baptiste Lespiau Maria Tsimpoukelli Nikolai Grigorev Doug Fritz Thibault Sottiaux Mantas Pajarskas Toby Pohlen Zhitao Gong Daniel Toyama Cyprien de Masson d\u2019Autume Yujia Li Tayfun Terzi Vladimir Mikulik Igor Babuschkin Aidan Clark Diego de Las Casas Aurelia Guy Chris Jones James Bradbury Matthew Johnson Blake A. Hechtman Laura Weidinger Iason Gabriel William S. Isaac Edward Lockhart Simon Osindero Laura Rimell Chris Dyer Oriol Vinyals Kareem Ayoub Jeff Stanway Lorrayne Bennett Demis Hassabis Koray Kavukcuoglu and Geoffrey Irving. 2021. Scaling language models: Methods analysis & insights from training gopher. arXiv:2112.11446. Retrieved from https:\/\/arxiv.org\/abs\/2112.11446."},{"key":"e_1_3_3_138_2","first-page":"140:1\u2013140:67","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (2020), 140:1\u2013140:67. https:\/\/jmlr.org\/papers\/v21\/20-074.html.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_139_2","unstructured":"Ashwin Ram Rohit Prasad Chandra Khatri Anu Venkatesh Raefer Gabriel Qing Liu Jeff Nunn Behnam Hedayatnia Ming Cheng Ashish Nagar Eric King Kate Bland Amanda Wartick Yi Pan Han Song Sk Jayadevan Gene Hwang and Art Pettigrue. 2018. Conversational AI: The science behind the alexa prize. arXiv:1801.03604. Retrieved from https:\/\/arxiv.org\/abs\/1801.03604."},{"key":"e_1_3_3_140_2","first-page":"1","volume-title":"Proceedings of the Joensuu Learning and Instruction Symposium","author":"Randolph Justus J.","year":"2005","unstructured":"Justus J. Randolph. 2005. Free-marginal multirater kappa (multirater \\(\\kappa\\) free): An alternative to fleiss fixed-marginal multirater kappa. In Proceedings of the Joensuu Learning and Instruction Symposium. 1\u201320."},{"key":"e_1_3_3_141_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1534"},{"key":"e_1_3_3_142_2","unstructured":"Scott E. Reed Konrad Zolna Emilio Parisotto Sergio Gomez Colmenarejo Alexander Novikov Gabriel Barth-Maron Mai Gimenez Yury Sulsky Jackie Kay Jost Tobias Springenberg Tom Eccles Jake Bruce Ali Razavi Ashley Edwards Nicolas Heess Yutian Chen Raia Hadsell Oriol Vinyals Mahyar Bordbar and Nando de Freitas. 2022. A generalist agent. arXiv:2205.12478. Retrieved from https:\/\/arxiv.org\/abs\/2205.12478."},{"key":"e_1_3_3_143_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_3_144_2","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/5236.001.0001","volume-title":"Parallel Distributed Process","author":"Remelhart D. E.","year":"1986","unstructured":"D. E. Remelhart, G. E. Hinton, and J. Williams. 1986. Learning Internal representations by error backpropagation. In Proceedings of the Parallel Distributed Process. The MIT press."},{"key":"e_1_3_3_145_2","first-page":"583","volume-title":"Proceedings of the 2011 Conf. on Empirical Methods in Natural Language Processing","author":"Ritter Alan","year":"2011","unstructured":"Alan Ritter, Colin Cherry, and William B. Dolan. 2011. Data-driven response generation in social media. In Proceedings of the 2011 Conf. on Empirical Methods in Natural Language Processing. ACL, 583\u2013593."},{"key":"e_1_3_3_146_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.24"},{"key":"e_1_3_3_147_2","doi-asserted-by":"publisher","DOI":"10.21236\/AD0256582"},{"key":"e_1_3_3_148_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2023.110273"},{"key":"e_1_3_3_149_2","volume-title":"Proceedings of the 10th Int. Conf. on Learning Representations","author":"Sanh Victor","year":"2022","unstructured":"Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Arun Raja, Manan Dey, M. Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal V. Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault F\u00e9vry, Jason Alan Fries, Ryan Teehan, Teven Le Scao, Stella Biderman, Leo Gao, Thomas Wolf, and Alexander M. Rush. 2022. Multitask prompted training enables zero-shot task generalization. In Proceedings of the 10th Int. Conf. on Learning Representations. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=9Vrb9D0WI4."},{"key":"e_1_3_3_150_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-5901"},{"key":"e_1_3_3_151_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.449"},{"key":"e_1_3_3_152_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ilic Daniel Hesslow Roman Castagn\u00e9 Alexandra Sasha Luccioni Fran\u00e7ois Yvon Matthias Gall\u00e9 Jonathan Tow Alexander M. Rush Stella Biderman Albert Webson Pawan Sasanka Ammanamanchi Thomas Wang Beno\u00eet Sagot Niklas Muennighoff Albert Villanova del Moral Olatunji Ruwase Rachel Bawden Stas Bekman Angelina McMillan-Major Iz Beltagy Huu Nguyen Lucile Saulnier Samson Tan Pedro Ortiz Suarez Victor Sanh Hugo Lauren\u00e7on Yacine Jernite Julien Launay Margaret Mitchell Colin Raffel Aaron Gokaslan Adi Simhi Aitor Soroa Alham Fikri Aji Amit Alfassy Anna Rogers Ariel Kreisberg Nitzav Canwen Xu Chenghao Mou Chris Emezue Christopher Klamm Colin Leong Daniel van Strien David Ifeoluwa Adelani et\u00a0al. 2022. BLOOM: A 176B-parameter open-access multilingual language model. arXiv:2211.05100. Retrieved from https:\/\/arxiv.org\/abs\/2211.05100."},{"key":"e_1_3_3_153_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347."},{"key":"e_1_3_3_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_3_155_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-4011"},{"key":"e_1_3_3_156_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1170"},{"key":"e_1_3_3_157_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_3_3_158_2","doi-asserted-by":"publisher","DOI":"10.5087\/dad.2018.101"},{"key":"e_1_3_3_159_2","unstructured":"Iulian Vlad Serban Chinnadhurai Sankar Mathieu Germain Saizheng Zhang Zhouhan Lin Sandeep Subramanian Taesup Kim Michael Pieper Sarath Chandar Nan Rosemary Ke Sai Mudumba Alexandre de Br\u00e9bisson Jose Sotelo Dendi Suhubdy Vincent Michalski Alexandre Nguyen Joelle Pineau and Yoshua Bengio. 2017. A deep reinforcement learning chatbot. arXiv:1709.02349. Retrieved from https:\/\/arxiv.org\/abs\/1709.02349."},{"key":"e_1_3_3_160_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.9883"},{"key":"e_1_3_3_161_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10983"},{"key":"e_1_3_3_162_2","unstructured":"CEUR Workshop Proceedings Proceedings of the NTCIR 2008 Lifeng Shang Tetsuya Sakai Hang Li Ryuichiro Higashinaka Yusuke Miyao Yuki Arase Masako Nomoto Nicola Ferro Ian Soboroff Overview of the NTCIR-13 short text conversation task. 2017"},{"key":"e_1_3_3_163_2","volume-title":"Proceedings of the 12th NTCIR Conf. on Evaluation of Information Access Technologies, National Center of Sciences.","author":"Shang Lifeng","year":"2016","unstructured":"Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, and Yusuke Miyao. 2016. Overview of the NTCIR-12 short text conversation task. In Proceedings of the 12th NTCIR Conf. on Evaluation of Information Access Technologies, National Center of Sciences.Noriko Kando, Tetsuya Sakai, and Mark Sanderson (Eds.), National Institute of Informatics (NII)."},{"key":"e_1_3_3_164_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3450097"},{"key":"e_1_3_3_165_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.425"},{"key":"e_1_3_3_166_2","series-title":"Proceedings of the 35th Int. Conf. on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10\u201315, 2018","first-page":"4603","volume":"80","author":"Shazeer Noam","year":"2018","unstructured":"Noam Shazeer and Mitchell Stern. 2018. Adafactor: Adaptive learning rates with sublinear memory cost. In Proceedings of the 35th Int. Conf. on Machine Learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10\u201315, 2018(Proceedings of Machine Learning Research, Vol. 80). Jennifer G. Dy and Andreas Krause (Eds.), PMLR, 4603\u20134611. Retrieved from http:\/\/proceedings.mlr.press\/v80\/shazeer18a.html."},{"key":"e_1_3_3_167_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054379"},{"key":"e_1_3_3_168_2","unstructured":"Kurt Shuster Samuel Humeau Antoine Bordes and Jason Weston. 2018. Engaging image chat: Modeling personality in grounded dialogue. arXiv:1811.00945. Retrieved from https:\/\/arxiv.org\/abs\/1811.00945."},{"key":"e_1_3_3_169_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.222"},{"key":"e_1_3_3_170_2","unstructured":"Kurt Shuster Jing Xu Mojtaba Komeili Da Ju Eric Michael Smith Stephen Roller Megan Ung Moya Chen Kushal Arora Joshua Lane Morteza Behrooz William Ngan Spencer Poff Naman Goyal Arthur Szlam Y.-Lan Boureau Melanie Kambadur and Jason Weston. 2022. BlenderBot 3: A deployed conversational agent that continually learns to responsibly engage. arXiv:2208.03188. Retrieved from https:\/\/arxiv.org\/abs\/2208.03188."},{"key":"e_1_3_3_171_2","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1995.1013"},{"key":"e_1_3_3_172_2","unstructured":"Eric Michael Smith Diana Gonzalez-Rico Emily Dinan and Y.-Lan Boureau. 2020. Controlling style in generated dialogue. arXiv:2009.10855. Retrieved from https:\/\/arxiv.org\/abs\/2009.10855."},{"key":"e_1_3_3_173_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.nlp4convai-1.8"},{"key":"e_1_3_3_174_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.183"},{"key":"e_1_3_3_175_2","first-page":"3483","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conf. on Neural Information Processing Systems 2015.","author":"Sohn Kihyuk","year":"2015","unstructured":"Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conf. on Neural Information Processing Systems 2015.Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.), 3483\u20133491."},{"key":"e_1_3_3_176_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1020"},{"key":"e_1_3_3_177_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539187"},{"key":"e_1_3_3_178_2","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_3_179_2","unstructured":"Yixuan Su and Nigel Collier. 2022. Contrastive search is what you need for neural text generation. arXiv:2210.14140. Retrieved from https:\/\/arxiv.org\/abs\/2210.14140."},{"key":"e_1_3_3_180_2","first-page":"3104","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014.","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014.Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.), 3104\u20133112."},{"key":"e_1_3_3_181_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA."},{"key":"e_1_3_3_182_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11321"},{"key":"e_1_3_3_183_2","doi-asserted-by":"publisher","DOI":"10.3115\/1075671.1075677"},{"key":"e_1_3_3_184_2","unstructured":"Romal Thoppilan Daniel De Freitas Jamie Hall Noam Shazeer Apoorv Kulshreshtha Heng-Tze Cheng Alicia Jin Taylor Bos Leslie Baker Yu Du YaGuang Li Hongrae Lee Huaixiu Steven Zheng Amin Ghafouri Marcelo Menegali Yanping Huang Maxim Krikun Dmitry Lepikhin James Qin Dehao Chen Yuanzhong Xu Zhifeng Chen Adam Roberts Maarten Bosma Yanqi Zhou Chung-Ching Chang Igor Krivokon Will Rusch Marc Pickett Kathleen S. Meier-Hellstern Meredith Ringel Morris Tulsee Doshi Renelito Delos Santos Toju Duke Johnny Soraker Ben Zevenbergen Vinodkumar Prabhakaran Mark Diaz Ben Hutchinson Kristen Olson Alejandra Molina Erin Hoffman-John Josh Lee Lora Aroyo Ravi Rajakumar Alena Butryna Matthew Lamm Viktoriya Kuzmina Joe Fenton Aaron Cohen Rachel Bernstein Ray Kurzweil Blaise Aguera-Arcas Claire Cui Marian Croak Ed H. Chi and Quoc Le. 2022. LaMDA: Language models for dialog applications. arXiv:2201.08239. Retrieved from https:\/\/arxiv.org\/abs\/2201.08239."},{"key":"e_1_3_3_185_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1062"},{"key":"e_1_3_3_186_2","first-page":"6306","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems 2017.","author":"Oord A\u00e4ron van den","year":"2017","unstructured":"A\u00e4ron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. 2017. Neural discrete representation learning. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems 2017.Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 6306\u20136315."},{"key":"e_1_3_3_187_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems 2017.","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems 2017.Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 5998\u20136008."},{"key":"e_1_3_3_188_2","unstructured":"Anu Venkatesh Chandra Khatri Ashwin Ram Fenfei Guo Raefer Gabriel Ashish Nagar Rohit Prasad Ming Cheng Behnam Hedayatnia Angeliki Metallinou Rahul Goel Shaohua Yang and Anirudh Raju. 2018. On evaluating and comparing conversational agents. arXiv:1801.03625. Retrieved from https:\/\/arxiv.org\/abs\/1801.03625."},{"key":"e_1_3_3_189_2","doi-asserted-by":"publisher","DOI":"10.1016\/0378-2166(79)90034-1"},{"key":"e_1_3_3_190_2","unstructured":"Oriol Vinyals and Quoc V. Le. 2015. A neural conversational model. arXiv:1506.05869. Retrieved from https:\/\/arxiv.org\/abs\/1506.05869."},{"key":"e_1_3_3_191_2","unstructured":"Richard Wallace. 2003. The Elements of AIML Style. https:\/\/web.archive.org\/web\/20060510064356http:\/\/www.alicebot.org\/style.pdf."},{"key":"e_1_3_3_192_2","first-page":"181","volume-title":"The Anatomy of A.L.I.C.E.","author":"Wallace Richard S.","year":"2009","unstructured":"Richard S. Wallace. 2009. The Anatomy of A.L.I.C.E.Springer Netherlands, Dordrecht, 181\u2013210."},{"key":"e_1_3_3_193_2","unstructured":"Alex Wang and Kyunghyun Cho. 2019. BERT has a mouth and it must speak: BERT as a Markov random field language model. arXiv:1902.04094. Retrieved from https:\/\/arxiv.org\/abs\/1902.04094."},{"key":"e_1_3_3_194_2","first-page":"22964","volume-title":"Proceedings of the Int. Conf. on Machine Learning.","author":"Wang Thomas","year":"2022","unstructured":"Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, and Colin Raffel. 2022. What language model architecture and pretraining objective works best for zero-shot generalization?. In Proceedings of the Int. Conf. on Machine Learning.Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.), PMLR, 22964\u201322984. Retrieved from https:\/\/proceedings.mlr.press\/v162\/wang22u.html."},{"key":"e_1_3_3_195_2","volume-title":"Proceedings of the 10th Int. Conf. on Learning Representations","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. Finetuned language models are zero-shot learners. In Proceedings of the 10th Int. Conf. on Learning Representations. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=gEZrGCozdqR."},{"key":"e_1_3_3_196_2","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533088"},{"key":"e_1_3_3_197_2","doi-asserted-by":"publisher","DOI":"10.1145\/357980.357991"},{"key":"e_1_3_3_198_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.96"},{"key":"e_1_3_3_199_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5713"},{"key":"e_1_3_3_200_2","unstructured":"Bernard Widrow. 1960. An Adaptive\u2019adaline\u2019neuron using chemical\u2019memistors\u2019 1553\u20131552. https:\/\/isl.stanford.edu\/widrow\/papers\/t1960anadaptive.pdf."},{"key":"e_1_3_3_201_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_3_202_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_3_203_2","unstructured":"Thomas Wolf Victor Sanh Julien Chaumond and Clement Delangue. 2019. TransferTransfo: A transfer learning approach for neural network based conversational agents. arXiv:1901.08149. Retrieved from https:\/\/arxiv.org\/abs\/1901.08149."},{"key":"e_1_3_3_204_2","unstructured":"Steve Worswick. 2018. Mitsuku wins Loebner Prize 2018!https:\/\/medium.com\/pandorabots-blog\/mitsuku-wins-loebner-prize-2018-3e8d98c5f2a7."},{"key":"e_1_3_3_205_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey Jeff Klingner Apurva Shah Melvin Johnson Xiaobing Liu Lukasz Kaiser Stephan Gouws Yoshikiyo Kato Taku Kudo Hideto Kazawa Keith Stevens George Kurian Nishant Patil Wei Wang Cliff Young Jason Smith Jason Riesa Alex Rudnick Oriol Vinyals Greg Corrado Macduff Hughes and Jeffrey Dean. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https:\/\/arxiv.org\/abs\/1609.08144."},{"key":"e_1_3_3_206_2","unstructured":"Jing Xu Da Ju Margaret Li Y-Lan Boureau Jason Weston and Emily Dinan. 2020. Recipes for safety in open-domain chatbots. arXiv:2010.07079. Retrieved from https:\/\/arxiv.org\/abs\/2010.07079."},{"key":"e_1_3_3_207_2","unstructured":"Jing Xu Arthur Szlam and Jason Weston. 2021. Beyond goldfish memory: Long-term open-domain conversation. arXiv:2107.07567. Retrieved from https:\/\/arxiv.org\/abs\/2107.07567."},{"key":"e_1_3_3_208_2","unstructured":"Jing Xu Megan Ung Mojtaba Komeili Kushal Arora Y.-Lan Boureau and Jason Weston. 2022. Learning new skills after deployment: Improving open-domain internet-driven dialogue with human feedback. arXiv:2208.03270. Retrieved from https:\/\/arxiv.org\/abs\/2208.03270."},{"key":"e_1_3_3_209_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00461"},{"key":"e_1_3_3_210_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.41"},{"key":"e_1_3_3_211_2","first-page":"3320","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014.","author":"Yosinski Jason","year":"2014","unstructured":"Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conf. on Neural Information Processing Systems 2014.Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.), 3320\u20133328."},{"key":"e_1_3_3_212_2","first-page":"276","volume-title":"Proceedings of the 33rd Int. Florida Artificial Intelligence Research Society Conf.","author":"Zandie Rohola","year":"2020","unstructured":"Rohola Zandie and Mohammad H. Mahoor. 2020. EmpTransfo: A multi-head transformer architecture for creating empathetic dialog systems. In Proceedings of the 33rd Int. Florida Artificial Intelligence Research Society Conf.Roman Bart\u00e1k and Eric Bell (Eds.), AAAI Press, 276\u2013281."},{"key":"e_1_3_3_213_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.392"},{"key":"e_1_3_3_214_2","volume-title":"Proceedings of the 14th NTCIR Conf. on Evaluation of Information Access Technologies.","author":"Zeng Zhaohao","year":"2019","unstructured":"Zhaohao Zeng, Sosuke Kato, and Tetsuya Sakai. 2019. Overview of the NTCIR-14 short text conversation task: Dialogue quality and nugget detection subtasks. In Proceedings of the 14th NTCIR Conf. on Evaluation of Information Access Technologies.Yiqun Kato, Makoto P. Liu, Noriko Kando, and Charles L. A. Clarke (Eds.), Springer."},{"key":"e_1_3_3_215_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1205"},{"key":"e_1_3_3_216_2","volume-title":"Proceedings of the 14th NTCIR Conf.","author":"Zhang Yaoqin","year":"2019","unstructured":"Yaoqin Zhang and Minlie Huang. 2019. Overview of the NTCIR-14 short text generation subtask: Emotion generation challenge. In Proceedings of the 14th NTCIR Conf.Yiqun Kato, Makoto P. Liu, Noriko Kando, and Charles L. A. Clarke (Eds.), Springer."},{"key":"e_1_3_3_217_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-demos.30"},{"key":"e_1_3_3_218_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683090"},{"key":"e_1_3_3_219_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1061"},{"key":"e_1_3_3_220_2","volume-title":"Proceedings of the 7th Int. Conf. on Learning Representations","author":"Zhelezniak Vitalii","year":"2019","unstructured":"Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Francesco Moramarco, Jack Flann, and Nils Y. Hammerla. 2019. Don\u2019t settle for average, go for the max: Fuzzy sets and max-pooled word vectors. In Proceedings of the 7th Int. Conf. on Learning Representations. OpenReview.net."},{"key":"e_1_3_3_221_2","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00368"},{"key":"e_1_3_3_222_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.235"},{"key":"e_1_3_3_223_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3604281","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3604281","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:17Z","timestamp":1750178837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3604281"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,6]]},"references-count":222,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3604281"],"URL":"https:\/\/doi.org\/10.1145\/3604281","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,6]]},"assertion":[{"value":"2022-06-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-31","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}