{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T21:13:20Z","timestamp":1778188400586,"version":"3.51.4"},"reference-count":226,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,4,17]],"date-time":"2021-04-17T00:00:00Z","timestamp":1618617600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2022,4,30]]},"abstract":"<jats:p>Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we provide a comprehensive review of more than 150 deep learning--based models for text classification developed in recent years, and we discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and we discuss future research directions.<\/jats:p>","DOI":"10.1145\/3439726","type":"journal-article","created":{"date-parts":[[2021,4,17]],"date-time":"2021-04-17T10:09:06Z","timestamp":1618654146000},"page":"1-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1317,"title":["Deep Learning--based Text Classification"],"prefix":"10.1145","volume":"54","author":[{"given":"Shervin","family":"Minaee","sequence":"first","affiliation":[{"name":"Snapchat Inc., Seattle, WA"}]},{"given":"Nal","family":"Kalchbrenner","sequence":"additional","affiliation":[{"name":"Google Brain, Amsterdam, Netherlands"}]},{"given":"Erik","family":"Cambria","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Nanyang Ave, Singapore"}]},{"given":"Narjes","family":"Nikzad","sequence":"additional","affiliation":[{"name":"University of Tabriz, Bahman Boulevard, Iran"}]},{"given":"Meysam","family":"Chenaghlu","sequence":"additional","affiliation":[{"name":"University of Tabriz, Bahman Boulevard, Iran"}]},{"given":"Jianfeng","family":"Gao","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, WA"}]}],"member":"320","published-online":{"date-parts":[[2021,4,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_2_1_2_1","volume-title":"A neural probabilistic language model. J. Mach. Learn. Res. 3 (Feb","author":"Bengio Yoshua","year":"2003","unstructured":"Yoshua Bengio , R\u00e9jean Ducharme , Pascal Vincent , and Christian Jauvin . 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Feb . 2003 ), 1137--1155. Yoshua Bengio, R\u00e9jean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Feb. 2003), 1137--1155."},{"key":"e_1_2_1_3_1","volume-title":"Advances in Neural Information Processing Systems","author":"Mikolov Tomas","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg S. Corrado , and Jeff Dean . 2013. Distributed representations of words and phrases and their compositionality . In Advances in Neural Information Processing Systems . MIT Press , 3111--3119. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. MIT Press, 3111--3119."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_2_1_5_1","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need . In Advances in Neural Information Processing Systems . MIT Press , 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998--6008."},{"key":"e_1_2_1_6_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https:\/\/s3-us-west-2.amazonaws.com\/openai-assets\/researchcovers\/languageunsupervised\/language understanding paper.pdf.  Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https:\/\/s3-us-west-2.amazonaws.com\/openai-assets\/researchcovers\/languageunsupervised\/language understanding paper.pdf."},{"key":"e_1_2_1_7_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding.","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https:\/\/arXiv:1810.04805. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.Retrieved from https:\/\/arXiv:1810.04805."},{"key":"e_1_2_1_8_1","volume-title":"Amanda Askell et\u00a0al","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell et\u00a0al . 2020 . Language models are few-shot learners. Retrieved from https:\/\/arXiv:2005.14165. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Retrieved from https:\/\/arXiv:2005.14165."},{"key":"e_1_2_1_9_1","volume-title":"Gshard: Scaling giant models with conditional computation and automatic sharding.","author":"Lepikhin Dmitry","year":"2020","unstructured":"Dmitry Lepikhin , HyoukJoong Lee , Yuanzhong Xu , Dehao Chen , Orhan Firat , Yanping Huang , Maxim Krikun , Noam Shazeer , and Zhifeng Chen . 2020 . Gshard: Scaling giant models with conditional computation and automatic sharding. Retrieved from https:\/\/arXiv:2006.16668. Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. Retrieved from https:\/\/arXiv:2006.16668."},{"key":"e_1_2_1_10_1","volume-title":"Rebooting AI: Building Artificial Intelligence We Can Trust","author":"Marcus Gary","year":"2019","unstructured":"Gary Marcus and Ernest Davis . 2019 . Rebooting AI: Building Artificial Intelligence We Can Trust . Pantheon . Gary Marcus and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon."},{"key":"e_1_2_1_11_1","unstructured":"Gary Marcus. 2020. The next decade in ai: Four steps towards robust artificial intelligence. Retrieved from https:\/\/arXiv:2002.06177.  Gary Marcus. 2020. The next decade in ai: Four steps towards robust artificial intelligence. Retrieved from https:\/\/arXiv:2002.06177."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Yixin Nie Adina Williams Emily Dinan Mohit Bansal Jason Weston and Douwe Kiela. 2019. Adversarial nli: A new benchmark for natural language understanding. Retrieved from https:\/\/arXiv:1910.14599.  Yixin Nie Adina Williams Emily Dinan Mohit Bansal Jason Weston and Douwe Kiela. 2019. Adversarial nli: A new benchmark for natural language understanding. Retrieved from https:\/\/arXiv:1910.14599.","DOI":"10.18653\/v1\/2020.acl-main.441"},{"key":"e_1_2_1_13_1","volume-title":"Joey Tianyi Zhou, and Peter Szolovits","author":"Jin Di","year":"2019","unstructured":"Di Jin , Zhijing Jin , Joey Tianyi Zhou, and Peter Szolovits . 2019 . Is bert really robust? Natural language attack on text classification and entailment. Retrieved from https:\/\/arXiv:1907.11932 2. Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2019. Is bert really robust? Natural language attack on text classification and entailment. Retrieved from https:\/\/arXiv:1907.11932 2."},{"key":"e_1_2_1_14_1","unstructured":"Xiaodong Liu Hao Cheng Pengcheng He Weizhu Chen Yu Wang Hoifung Poon and Jianfeng Gao. 2020. Adversarial training for large neural language models. Retrieved from https:\/\/arXiv:2004.08994.  Xiaodong Liu Hao Cheng Pengcheng He Weizhu Chen Yu Wang Hoifung Poon and Jianfeng Gao. 2020. Adversarial training for large neural language models. Retrieved from https:\/\/arXiv:2004.08994."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Jacob Andreas Marcus Rohrbach Trevor Darrell and Dan Klein. 2016. Learning to compose neural networks for question answering. Retrieved from https:\/\/arXiv:1601.01705.  Jacob Andreas Marcus Rohrbach Trevor Darrell and Dan Klein. 2016. Learning to compose neural networks for question answering. Retrieved from https:\/\/arXiv:1601.01705.","DOI":"10.18653\/v1\/N16-1181"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1167"},{"key":"e_1_2_1_17_1","unstructured":"Imanol Schlag Paul Smolensky Roland Fernandez Nebojsa Jojic J\u00fcrgen Schmidhuber and Jianfeng Gao. 2019. Enhancing the transformer with explicit relational encoding for math problem solving. Retrieved from https:\/\/arXiv:1910.06611.  Imanol Schlag Paul Smolensky Roland Fernandez Nebojsa Jojic J\u00fcrgen Schmidhuber and Jianfeng Gao. 2019. Enhancing the transformer with explicit relational encoding for math problem solving. Retrieved from https:\/\/arXiv:1910.06611."},{"key":"e_1_2_1_18_1","unstructured":"Jianfeng Gao Baolin Peng Chunyuan Li Jinchao Li Shahin Shayandeh Lars Liden and Heung-Yeung Shum. 2020. Robust conversational AI with grounded text generation. Retrieved from https:\/\/arXiv:2009.03457.  Jianfeng Gao Baolin Peng Chunyuan Li Jinchao Li Shahin Shayandeh Lars Liden and Heung-Yeung Shum. 2020. Robust conversational AI with grounded text generation. Retrieved from https:\/\/arXiv:2009.03457."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3390\/info10040150"},{"key":"e_1_2_1_20_1","volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D.","unstructured":"Christopher D. Manning , Hinrich Sch\u00fctze , and Prabhakar Raghavan . 2008. Introduction to Information Retrieval . Cambridge University Press . Christopher D. Manning, Hinrich Sch\u00fctze, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Cambridge University Press."},{"key":"e_1_2_1_21_1","volume-title":"Martin","author":"Jurasky Daniel","year":"2008","unstructured":"Daniel Jurasky and James H . Martin . 2008 . Speech and language processing: An introduction to natural language Processing. Computational Linguistics and Speech Recognition. Prentice Hall , NJ. Daniel Jurasky and James H. Martin. 2008. Speech and language processing: An introduction to natural language Processing. Computational Linguistics and Speech Recognition. Prentice Hall, NJ."},{"key":"e_1_2_1_22_1","volume-title":"Glue: A multi-task benchmark and analysis platform for natural language understanding.","author":"Wang Alex","year":"2018","unstructured":"Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel R Bowman . 2018 . Glue: A multi-task benchmark and analysis platform for natural language understanding. Retrieved from https:\/\/arXiv:1804.07461. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. Retrieved from https:\/\/arXiv:1804.07461."},{"key":"e_1_2_1_23_1","unstructured":"Xiaodong Liu Pengcheng He Weizhu Chen and Jianfeng Gao. 2019. Multi-task deep neural networks for natural language understanding. Retrieved from https:\/\/arXiv:1901.11504.  Xiaodong Liu Pengcheng He Weizhu Chen and Jianfeng Gao. 2019. Multi-task deep neural networks for natural language understanding. Retrieved from https:\/\/arXiv:1901.11504."},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Jian Zhang Konstantin Lopyrev and Percy Liang. 2016. Squad: 100 000+ questions for machine comprehension of text. Retrieved from https:\/\/arXiv:1606.05250.  Pranav Rajpurkar Jian Zhang Konstantin Lopyrev and Percy Liang. 2016. Squad: 100 000+ questions for machine comprehension of text. Retrieved from https:\/\/arXiv:1606.05250.","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/S14-2001"},{"key":"e_1_2_1_26_1","volume-title":"Deep Learning","author":"Goodfellow Ian","unstructured":"Ian Goodfellow , Yoshua Bengio , and Aaron Courville . 2016. Deep Learning . MIT Press . Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press."},{"key":"e_1_2_1_27_1","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Retrieved from https:\/\/arXiv:1301.3781.  Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Retrieved from https:\/\/arXiv:1301.3781."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1162"},{"key":"e_1_2_1_30_1","unstructured":"Armand Joulin Edouard Grave Piotr Bojanowski Matthijs Douze H\u00e9rve J\u00e9gou and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. Retrieved from https:\/\/arXiv:1612.03651.  Armand Joulin Edouard Grave Piotr Bojanowski Matthijs Douze H\u00e9rve J\u00e9gou and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. Retrieved from https:\/\/arXiv:1612.03651."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 90--94","author":"Wang Sida","unstructured":"Sida Wang and Christopher D. Manning . 2012. Baselines and bigrams: Simple, good sentiment and topic classification . In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 90--94 . Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 90--94."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Conference on Machine Learning. 1188--1196","author":"Le Quoc","year":"2014","unstructured":"Quoc Le and Tomas Mikolov . 2014 . Distributed representations of sentences and documents . In Proceedings of the International Conference on Machine Learning. 1188--1196 . Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188--1196."},{"key":"e_1_2_1_33_1","unstructured":"Kai Sheng Tai Richard Socher and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Retrieved from https:\/\/arXiv:1503.00075.  Kai Sheng Tai Richard Socher and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. Retrieved from https:\/\/arXiv:1503.00075."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Conference on Machine Learning. 1604--1612","author":"Zhu Xiaodan","year":"2015","unstructured":"Xiaodan Zhu , Parinaz Sobihani , and Hongyu Guo . 2015 . Long short-term memory over recursive structures . In Proceedings of the International Conference on Machine Learning. 1604--1612 . Xiaodan Zhu, Parinaz Sobihani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In Proceedings of the International Conference on Machine Learning. 1604--1612."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Jianpeng Cheng Li Dong and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. Retrieved from https:\/\/arXiv:1601.06733.  Jianpeng Cheng Li Dong and Mirella Lapata. 2016. Long short-term memory-networks for machine reading. Retrieved from https:\/\/arXiv:1601.06733.","DOI":"10.18653\/v1\/D16-1053"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1280"},{"key":"e_1_2_1_37_1","volume-title":"Topicrnn: A recurrent neural network with long-range semantic dependency.","author":"Dieng Adji B.","year":"2016","unstructured":"Adji B. Dieng , Chong Wang , Jianfeng Gao , and John Paisley . 2016 . Topicrnn: A recurrent neural network with long-range semantic dependency. Retrieved from https:\/\/arXiv:1611.01702. Adji B. Dieng, Chong Wang, Jianfeng Gao, and John Paisley. 2016. Topicrnn: A recurrent neural network with long-range semantic dependency. Retrieved from https:\/\/arXiv:1611.01702."},{"key":"e_1_2_1_38_1","unstructured":"Pengfei Liu Xipeng Qiu and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. Retrieved from https:\/\/arXiv:1605.05101.  Pengfei Liu Xipeng Qiu and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. Retrieved from https:\/\/arXiv:1605.05101."},{"key":"e_1_2_1_39_1","unstructured":"Rie Johnson and Tong Zhang. 2016. Supervised and semi-supervised text categorization using LSTM for region embeddings. Retrieved from https:\/\/arXiv:1602.02373.  Rie Johnson and Tong Zhang. 2016. Supervised and semi-supervised text categorization using LSTM for region embeddings. Retrieved from https:\/\/arXiv:1602.02373."},{"key":"e_1_2_1_40_1","unstructured":"Peng Zhou Zhenyu Qi Suncong Zheng Jiaming Xu Hongyun Bao and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Retrieved from https:\/\/arXiv:1611.06639.  Peng Zhou Zhenyu Qi Suncong Zheng Jiaming Xu Hongyun Bao and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Retrieved from https:\/\/arXiv:1611.06639."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Zhiguo Wang Wael Hamza and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. Retrieved from https:\/\/arXiv:1702.03814.  Zhiguo Wang Wael Hamza and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. Retrieved from https:\/\/arXiv:1702.03814.","DOI":"10.24963\/ijcai.2017\/579"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence.","author":"Wan Shengxian","year":"2016","unstructured":"Shengxian Wan , Yanyan Lan , Jiafeng Guo , Jun Xu , Liang Pang , and Xueqi Cheng . 2016 . A deep architecture for semantic matching with multiple positional sentence representations . In Proceedings of the 30th AAAI Conference on Artificial Intelligence. Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi Cheng. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the 30th AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631--1642","author":"Socher Richard","year":"2013","unstructured":"Richard Socher , Alex Perelygin , Jean Wu , Jason Chuang , Christopher D Manning , Andrew Y. Ng , and Christopher Potts . 2013 . Recursive deep models for semantic compositionality over a sentiment treebank . In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631--1642 . Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631--1642."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1062"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080834"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1011"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1052"},{"key":"e_1_2_1_50_1","volume-title":"Advances in Neural Information Processing Systems","author":"Zhang Xiang","unstructured":"Xiang Zhang , Junbo Zhao , and Yann LeCun . 2015. Character-level convolutional networks for text classification . In Advances in Neural Information Processing Systems . MIT Press , 649--657. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. MIT Press, 649--657."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence.","author":"Kim Yoon","unstructured":"Yoon Kim , Yacine Jernite , David Sontag , and Alexander M. Rush . 2016. Character-aware neural language models . In Proceedings of the 30th AAAI Conference on Artificial Intelligence. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the IEEE 17th International Conference on Information Reuse and Integration (IRI\u201916)","author":"Joseph","year":"2016","unstructured":"Joseph D. Prusa and Taghi M. Khoshgoftaar. 2016. Designing a better data representation for deep neural networks and text classification . In Proceedings of the IEEE 17th International Conference on Information Reuse and Integration (IRI\u201916) . DOI:http:\/\/dx.doi.org\/10.1109\/IRI. 2016 .61 Joseph D. Prusa and Taghi M. Khoshgoftaar. 2016. Designing a better data representation for deep neural networks and text classification. In Proceedings of the IEEE 17th International Conference on Information Reuse and Integration (IRI\u201916). DOI:http:\/\/dx.doi.org\/10.1109\/IRI.2016.61"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very deep convolutional networks for large-scale image recognition . In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915) . Retrieved from https:\/\/arxiv:1409.1556. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915). Retrieved from https:\/\/arxiv:1409.1556."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Alexis Conneau Holger Schwenk Lo\u00efc Barrault and Yann Lecun. 2016. Very deep convolutional networks for text classification. Retrieved from https:\/\/arXiv:1606.01781.  Alexis Conneau Holger Schwenk Lo\u00efc Barrault and Yann Lecun. 2016. Very deep convolutional networks for text classification. Retrieved from https:\/\/arXiv:1606.01781.","DOI":"10.18653\/v1\/E17-1104"},{"key":"e_1_2_1_56_1","series-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http:\/\/dx.doi.org\/10.1007\/978-3-030-30487-4_16","volume-title":"Squeezed very deep convolutional neural networks for text classification","author":"Duque Andr\u00e9a B.","unstructured":"Andr\u00e9a B. Duque , Lu\u00e3 L\u00e1zaro J. Santos , David Mac\u00eado , and Cleber Zanchettin . 2019. Squeezed very deep convolutional neural networks for text classification . In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http:\/\/dx.doi.org\/10.1007\/978-3-030-30487-4_16 Retrieved from https:\/\/arxiv:1901.09821. Andr\u00e9a B. Duque, Lu\u00e3 L\u00e1zaro J. Santos, David Mac\u00eado, and Cleber Zanchettin. 2019. Squeezed very deep convolutional neural networks for text classification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http:\/\/dx.doi.org\/10.1007\/978-3-030-30487-4_16 Retrieved from https:\/\/arxiv:1901.09821."},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.","author":"Le Hoa T.","year":"2018","unstructured":"Hoa T. Le , Christophe Cerisara , and Alexandre Denis . 2018 . Do convolutional networks need to be deep for text classification? In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence. Hoa T. Le, Christophe Cerisara, and Alexandre Denis. 2018. Do convolutional networks need to be deep for text classification? In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Huang Gao","year":"2017","unstructured":"Gao Huang , Zhuang Liu , Laurens Van Der Maaten , and Kilian Q. Weinberger . 2017. Densely connected convolutional networks . In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . DOI:http:\/\/dx.doi.org\/10.1109\/CVPR. 2017 .243 arxiv:1608.06993 Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). DOI:http:\/\/dx.doi.org\/10.1109\/CVPR.2017.243 arxiv:1608.06993"},{"key":"e_1_2_1_59_1","volume-title":"Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing","author":"Guo Bao","year":"2019","unstructured":"Bao Guo , Chunxia Zhang , Junmin Liu , and Xiaoyi Ma. 2019. Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing ( 2019 ). DOI:http:\/\/dx.doi.org\/10.1016\/j.neucom.2019.07.052 Bao Guo, Chunxia Zhang, Junmin Liu, and Xiaoyi Ma. 2019. Improving text classification with weighted word embeddings via a multi-channel TextCNN model. Neurocomputing (2019). DOI:http:\/\/dx.doi.org\/10.1016\/j.neucom.2019.07.052"},{"key":"e_1_2_1_60_1","volume-title":"A sensitivity analysis of (and practitioners","author":"Zhang Ye","unstructured":"Ye Zhang and Byron Wallace . 2015. A sensitivity analysis of (and practitioners \u2019 guide to) convolutional neural networks for sentence classification. Retrieved from https:\/\/arXiv:1510.03820. Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners\u2019 guide to) convolutional neural networks for sentence classification. Retrieved from https:\/\/arXiv:1510.03820."},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Lili Mou Rui Men Ge Li Yan Xu Lu Zhang Rui Yan and Zhi Jin. 2015. Natural language inference by tree-based convolution and heuristic matching. Retrieved from https:\/\/arXiv:1512.08422.  Lili Mou Rui Men Ge Li Yan Xu Lu Zhang Rui Yan and Zhi Jin. 2015. Natural language inference by tree-based convolution and heuristic matching. Retrieved from https:\/\/arXiv:1512.08422.","DOI":"10.18653\/v1\/P16-2022"},{"key":"e_1_2_1_62_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916)","author":"Pang Liang","year":"2016","unstructured":"Liang Pang , Yanyan Lan , Jiafeng Guo , Jun Xu , Shengxian Wan , and Xueqi Cheng . 2016 . Text matching as image recognition . In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916) . Retrieved from https:\/\/arxiv:1602.06359. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916). Retrieved from https:\/\/arxiv:1602.06359."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/406"},{"key":"e_1_2_1_64_1","doi-asserted-by":"crossref","unstructured":"Sarvnaz Karimi Xiang Dai Hamedh Hassanzadeh and Anthony Nguyen. 2017. Automatic diagnosis coding of radiology reports: A comparison of deep learning and conventional classification methods. BioNLP. DOI:http:\/\/dx.doi.org\/10.18653\/v1\/w17-2342  Sarvnaz Karimi Xiang Dai Hamedh Hassanzadeh and Anthony Nguyen. 2017. Automatic diagnosis coding of radiology reports: A comparison of deep learning and conventional classification methods. BioNLP. DOI:http:\/\/dx.doi.org\/10.18653\/v1\/w17-2342","DOI":"10.18653\/v1\/W17-2342"},{"key":"e_1_2_1_65_1","volume-title":"DeepMeSH: Deep semantic representation for improving large-scale MeSH indexing. Bioinformatics","author":"Peng Shengwen","year":"2016","unstructured":"Shengwen Peng , Ronghui You , Hongning Wang , Chengxiang Zhai , Hiroshi Mamitsuka , and Shanfeng Zhu . 2016. DeepMeSH: Deep semantic representation for improving large-scale MeSH indexing. Bioinformatics ( 2016 ). Retrieved from DOI:http:\/\/dx.doi.org\/10.1093\/bioinformatics\/btw294 Shengwen Peng, Ronghui You, Hongning Wang, Chengxiang Zhai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2016. DeepMeSH: Deep semantic representation for improving large-scale MeSH indexing. Bioinformatics (2016). Retrieved from DOI:http:\/\/dx.doi.org\/10.1093\/bioinformatics\/btw294"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808719.2808746"},{"key":"e_1_2_1_67_1","unstructured":"Mark Hughes Irene Li Spyros Kotoulas and Toyotaro Suzumura. 2017. Medical text classification using convolutional neural networks. Studies Health Technol. Info. (2017). DOI:http:\/\/dx.doi.org\/10.3233\/978-1-61499-753-5-246 Retrieved from https:\/\/arxiv:1704.06841.  Mark Hughes Irene Li Spyros Kotoulas and Toyotaro Suzumura. 2017. Medical text classification using convolutional neural networks. Studies Health Technol. Info. (2017). DOI:http:\/\/dx.doi.org\/10.3233\/978-1-61499-753-5-246 Retrieved from https:\/\/arxiv:1704.06841."},{"key":"e_1_2_1_68_1","volume-title":"Wang","author":"Hinton Geoffrey E.","year":"2011","unstructured":"Geoffrey E. Hinton , Alex Krizhevsky , and Sida D . Wang . 2011 . Transforming auto-encoders. In Proceedings of the International Conference on Artificial Neural Networks. Springer , 44--51. Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 44--51."},{"key":"e_1_2_1_69_1","volume-title":"Hinton","author":"Sabour Sara","year":"2017","unstructured":"Sara Sabour , Nicholas Frosst , and Geoffrey E . Hinton . 2017 . Dynamic routing between capsules. In Advances in Neural Information Processing Systems. MIT Press , 3856--3866. Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Advances in Neural Information Processing Systems. MIT Press, 3856--3866."},{"key":"e_1_2_1_70_1","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Sabour Sara","year":"2018","unstructured":"Sara Sabour , Nicholas Frosst , and Geoffrey Hinton . 2018 . Matrix capsules with EM routing . In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918) . 1--15. Sara Sabour, Nicholas Frosst, and Geoffrey Hinton. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918). 1--15."},{"key":"e_1_2_1_71_1","unstructured":"Wei Zhao Jianbo Ye Min Yang Zeyang Lei Suofei Zhang and Zhou Zhao. 2018. Investigating capsule networks with dynamic routing for text classification. Retrieved from https:\/\/arXiv:1804.00538.  Wei Zhao Jianbo Ye Min Yang Zeyang Lei Suofei Zhang and Zhou Zhao. 2018. Investigating capsule networks with dynamic routing for text classification. Retrieved from https:\/\/arXiv:1804.00538."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2019.06.014"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1150"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.10.033"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-2045"},{"key":"e_1_2_1_76_1","unstructured":"Hao Ren and Hong Lu. 2018. Compositional coding capsule network with k-means routing for text classification. Retrieved from https:\/\/arXiv:1810.09177.  Hao Ren and Hong Lu. 2018. Compositional coding capsule network with k-means routing for text classification. Retrieved from https:\/\/arXiv:1810.09177."},{"key":"e_1_2_1_77_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from https:\/\/arXiv:1409.0473.  Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from https:\/\/arXiv:1409.0473."},{"key":"e_1_2_1_78_1","doi-asserted-by":"crossref","unstructured":"Minh-Thang Luong Hieu Pham and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. Retrieved from https:\/\/arXiv:1508.04025.  Minh-Thang Luong Hieu Pham and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. Retrieved from https:\/\/arXiv:1508.04025.","DOI":"10.18653\/v1\/D15-1166"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1174"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1024"},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence.","author":"Shen Tao","year":"2018","unstructured":"Tao Shen , Tianyi Zhou , Guodong Long , Jing Jiang , Shirui Pan , and Chengqi Zhang . 2018 . Disan: Directional self-attention network for rnn\/cnn-free language understanding . In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. Disan: Directional self-attention network for rnn\/cnn-free language understanding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_82_1","unstructured":"Yang Liu Chengjie Sun Lei Lin and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Retrieved from https:\/\/arXiv:1605.09090.  Yang Liu Chengjie Sun Lei Lin and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Retrieved from https:\/\/arXiv:1605.09090."},{"key":"e_1_2_1_83_1","unstructured":"Cicero dos Santos Ming Tan Bing Xiang and Bowen Zhou. 2016. Attentive pooling networks. Retrieved from https:\/\/arXiv:1602.03609.  Cicero dos Santos Ming Tan Bing Xiang and Bowen Zhou. 2016. Attentive pooling networks. Retrieved from https:\/\/arXiv:1602.03609."},{"key":"e_1_2_1_84_1","doi-asserted-by":"crossref","unstructured":"Guoyin Wang Chunyuan Li Wenlin Wang Yizhe Zhang Dinghan Shen Xinyuan Zhang Ricardo Henao and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. Retrieved from https:\/\/arXiv:1805.04174.  Guoyin Wang Chunyuan Li Wenlin Wang Yizhe Zhang Dinghan Shen Xinyuan Zhang Ricardo Henao and Lawrence Carin. 2018. Joint embedding of words and labels for text classification. Retrieved from https:\/\/arXiv:1805.04174.","DOI":"10.18653\/v1\/P18-1216"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016586"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00097"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/613"},{"key":"e_1_2_1_88_1","volume-title":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 287--296","author":"Yang Liu","unstructured":"Liu Yang , Qingyao Ai , Jiafeng Guo , and W. Bruce Croft . 2016. aNMM: Ranking short answer texts with attention-based neural matching model . In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 287--296 . Liu Yang, Qingyao Ai, Jiafeng Guo, and W. Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 287--296."},{"key":"e_1_2_1_89_1","volume-title":"Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio.","author":"Lin Zhouhan","year":"2017","unstructured":"Zhouhan Lin , Minwei Feng , Cicero Nogueira dos Santos , Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017 . A structured self-attentive sentence embedding. Retrieved from https:\/\/arXiv:1703.03130. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. Retrieved from https:\/\/arXiv:1703.03130."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/621"},{"key":"e_1_2_1_91_1","doi-asserted-by":"crossref","unstructured":"Ikuya Yamada and Hiroyuki Shindo. 2019. Neural attentive bag-of-entities model for text classification. Retrieved from https:\/\/arXiv:1909.01259.  Ikuya Yamada and Hiroyuki Shindo. 2019. Neural attentive bag-of-entities model for text classification. Retrieved from https:\/\/arXiv:1909.01259.","DOI":"10.18653\/v1\/K19-1052"},{"key":"e_1_2_1_92_1","doi-asserted-by":"crossref","unstructured":"Ankur P. Parikh Oscar Tackstrom Dipanjan Das and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. Retrieved from https:\/\/arXiv:1606.01933.  Ankur P. Parikh Oscar Tackstrom Dipanjan Das and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. Retrieved from https:\/\/arXiv:1606.01933.","DOI":"10.18653\/v1\/D16-1244"},{"key":"e_1_2_1_93_1","unstructured":"Qian Chen Zhen-Hua Ling and Xiaodan Zhu. 2018. Enhancing sentence embedding with generalized pooling. Retrieved from https:\/\/arXiv:1806.09828.  Qian Chen Zhen-Hua Ling and Xiaodan Zhu. 2018. Enhancing sentence embedding with generalized pooling. Retrieved from https:\/\/arXiv:1806.09828."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2020.08.005"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-1038"},{"key":"e_1_2_1_96_1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Weston Jason","year":"2015","unstructured":"Jason Weston , Sumit Chopra , and Antoine Bordes . 2015 . Memory networks . In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915) . Retrieved from https:\/\/arxiv:1410.3916. Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915). Retrieved from https:\/\/arxiv:1410.3916."},{"key":"e_1_2_1_97_1","volume-title":"Rob Fergus et\u00a0al","author":"Sukhbaatar Sainbayar","year":"2015","unstructured":"Sainbayar Sukhbaatar , Jason Weston , Rob Fergus et\u00a0al . 2015 . End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press , 2440--2448. Sainbayar Sukhbaatar, Jason Weston, Rob Fergus et\u00a0al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press, 2440--2448."},{"key":"e_1_2_1_98_1","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916)","author":"Kumar Ankit","year":"2016","unstructured":"Ankit Kumar , Ozan Irsoy , Peter Ondruska , Mohit Iyyer , James Bradbury , Ishaan Gulrajani , Victor Zhong , Romain Paulus , and Richard Socher . 2016 . Ask me anything: Dynamic memory networks for natural language processing . In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916) . Retrieved from https:\/\/arXiv:1506.07285. Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916). Retrieved from https:\/\/arXiv:1506.07285."},{"key":"e_1_2_1_99_1","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916)","author":"Xiong Caiming","year":"2016","unstructured":"Caiming Xiong , Stephen Merity , and Richard Socher . 2016 . Dynamic memory networks for visual and textual question answering . In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916) . Retrieved from https:\/\/arxiv:1603.01417. Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916). Retrieved from https:\/\/arxiv:1603.01417."},{"key":"e_1_2_1_100_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404--411","author":"Mihalcea Rada","year":"2004","unstructured":"Rada Mihalcea and Paul Tarau . 2004 . Textrank: Bringing order into text . In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404--411 . Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 404--411."},{"key":"e_1_2_1_101_1","volume-title":"Yu","author":"Wu Zonghan","year":"2019","unstructured":"Zonghan Wu , Shirui Pan , Fengwen Chen , Guodong Long , Chengqi Zhang , and Philip S . Yu . 2019 . A comprehensive survey on graph neural networks. Retrieved from https:\/\/arXiv:1901.00596. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. Retrieved from https:\/\/arXiv:1901.00596."},{"key":"e_1_2_1_102_1","unstructured":"Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from https:\/\/arXiv:1609.02907.  Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from https:\/\/arXiv:1609.02907."},{"key":"e_1_2_1_103_1","volume-title":"Advances in Neural Information Processing Systems","author":"Hamilton Will","unstructured":"Will Hamilton , Zhitao Ying , and Jure Leskovec . 2017. Inductive representation learning on large graphs . In Advances in Neural Information Processing Systems . MIT Press , 1024--1034. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. MIT Press, 1024--1034."},{"key":"e_1_2_1_104_1","unstructured":"Petar Veli\u010dkovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Lio and Yoshua Bengio. 2017. Graph attention networks. Retrieved from https:\/\/arXiv:1710.10903.  Petar Veli\u010dkovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Lio and Yoshua Bengio. 2017. Graph attention networks. Retrieved from https:\/\/arXiv:1710.10903."},{"key":"e_1_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186005"},{"key":"e_1_2_1_106_1","volume-title":"Yu","author":"Peng Hao","year":"2019","unstructured":"Hao Peng , Jianxin Li , Qiran Gong , Senzhang Wang , Lifang He , Bo Li , Lihong Wang , and Philip S . Yu . 2019 . Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. Retrieved from https:\/\/arXiv:1906.04898. Hao Peng, Jianxin Li, Qiran Gong, Senzhang Wang, Lifang He, Bo Li, Lihong Wang, and Philip S. Yu. 2019. Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. Retrieved from https:\/\/arXiv:1906.04898."},{"key":"e_1_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017370"},{"key":"e_1_2_1_108_1","volume-title":"Christopher Fifty, Tao Yu, and Kilian Q. Weinberger.","author":"Wu Felix","year":"2019","unstructured":"Felix Wu , Tianyi Zhang , Amauri Holanda de Souza Jr ., Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019 . Simplifying graph convolutional networks. Retrieved from https:\/\/arXiv:1902.07153. Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr., Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019. Simplifying graph convolutional networks. Retrieved from https:\/\/arXiv:1902.07153."},{"key":"e_1_2_1_109_1","doi-asserted-by":"crossref","unstructured":"Lianzhe Huang Dehong Ma Sujian Li Xiaodong Zhang and Houfeng Wang. 2019. Text level graph neural network for text classification. Retrieved from https:\/\/arXiv:1910.02356.  Lianzhe Huang Dehong Ma Sujian Li Xiaodong Zhang and Houfeng Wang. 2019. Text level graph neural network for text classification. Retrieved from https:\/\/arXiv:1910.02356.","DOI":"10.18653\/v1\/D19-1345"},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016762"},{"key":"e_1_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001493000339"},{"key":"e_1_2_1_112_1","volume-title":"Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL\u201911)","author":"Yih Wen","year":"2011","unstructured":"Wen tau Yih , Kristina Toutanova , John C. Platt , and Christopher Meek . 2011 . Learning discriminative projections for text similarity measures . In Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL\u201911) . Wen tau Yih, Kristina Toutanova, John C. Platt, and Christopher Meek. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL\u201911)."},{"key":"e_1_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505665"},{"key":"e_1_2_1_114_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661935"},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000074"},{"key":"e_1_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767738"},{"key":"e_1_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1036"},{"key":"e_1_2_1_118_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1044"},{"key":"e_1_2_1_119_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916)","author":"Mueller Jonas","year":"2016","unstructured":"Jonas Mueller and Aditya Thyagarajan . 2016 . Siamese recurrent architectures for learning sentence similarity . In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916) . Jonas Mueller and Aditya Thyagarajan. 2016. Siamese recurrent architectures for learning sentence similarity. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916)."},{"key":"e_1_2_1_120_1","doi-asserted-by":"crossref","unstructured":"Paul Neculoiu Maarten Versteegh and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. Retrieved from DOI:http:\/\/dx.doi.org\/10.18653\/v1\/w16-1617.  Paul Neculoiu Maarten Versteegh and Mihai Rotaru. 2016. Learning text similarity with siamese recurrent networks. Retrieved from DOI:http:\/\/dx.doi.org\/10.18653\/v1\/w16-1617.","DOI":"10.18653\/v1\/W16-1617"},{"key":"e_1_2_1_121_1","unstructured":"Pengfei Liu Xipeng Qiu and Xuanjing Huang. 2016. Modelling interaction of sentence pair with coupled-lstms. Retrieved from https:\/\/arXiv:1605.05573.  Pengfei Liu Xipeng Qiu and Xuanjing Huang. 2016. Modelling interaction of sentence pair with coupled-lstms. Retrieved from https:\/\/arXiv:1605.05573."},{"key":"e_1_2_1_122_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1181"},{"key":"e_1_2_1_123_1","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL\u201916)","author":"Renter Tom","year":"2016","unstructured":"Tom Renter , Alexey Borisov , and Maarten De Rijke . 2016 . Siamese CBOW: Optimizing word embeddings for sentence representations . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL\u201916) . DOI:http:\/\/dx.doi.org\/10.18653\/v1\/p16-1089 arxiv:1606.04640 Tom Renter, Alexey Borisov, and Maarten De Rijke. 2016. Siamese CBOW: Optimizing word embeddings for sentence representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL\u201916). DOI:http:\/\/dx.doi.org\/10.18653\/v1\/p16-1089 arxiv:1606.04640"},{"key":"e_1_2_1_124_1","doi-asserted-by":"crossref","unstructured":"Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. DOI:http:\/\/dx.doi.org\/10.18653\/v1\/d19-1410 Retrieved from https:\/\/arxiv:1908.10084.  Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. DOI:http:\/\/dx.doi.org\/10.18653\/v1\/d19-1410 Retrieved from https:\/\/arxiv:1908.10084.","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_125_1","unstructured":"Wenhao Lu Jian Jiao and Ruofei Zhang. 2020. TwinBERT: Distilling knowledge to twin-structured BERT models for efficient retrieval. Retrieved from https:\/\/arXiv:2002.06275.  Wenhao Lu Jian Jiao and Ruofei Zhang. 2020. TwinBERT: Distilling knowledge to twin-structured BERT models for efficient retrieval. Retrieved from https:\/\/arXiv:2002.06275."},{"key":"e_1_2_1_126_1","unstructured":"Ming Tan Cicero dos Santos Bing Xiang and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. Retrieved from https:\/\/arXiv:1511.04108.  Ming Tan Cicero dos Santos Bing Xiang and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. Retrieved from https:\/\/arXiv:1511.04108."},{"key":"e_1_2_1_127_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159664"},{"key":"e_1_2_1_128_1","doi-asserted-by":"publisher","DOI":"10.1109\/GlobalSIP.2017.8309095"},{"key":"e_1_2_1_129_1","unstructured":"Chunting Zhou Chonglin Sun Zhiyuan Liu and Francis Lau. 2015. A C-LSTM neural network for text classification. Retrieved from https:\/\/arXiv:1511.08630.  Chunting Zhou Chonglin Sun Zhiyuan Liu and Francis Lau. 2015. A C-LSTM neural network for text classification. Retrieved from https:\/\/arXiv:1511.08630."},{"key":"e_1_2_1_130_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1177"},{"key":"e_1_2_1_131_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966144"},{"key":"e_1_2_1_132_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1167"},{"key":"e_1_2_1_133_1","unstructured":"Yijun Xiao and Kyunghyun Cho. 2016. Efficient character-level document classification by combining convolution and recurrent layers. Retrieved from https:\/\/arXiv:1602.00367.  Yijun Xiao and Kyunghyun Cho. 2016. Efficient character-level document classification by combining convolution and recurrent layers. Retrieved from https:\/\/arXiv:1602.00367."},{"key":"e_1_2_1_134_1","volume-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence.","author":"Lai Siwei","year":"2015","unstructured":"Siwei Lai , Liheng Xu , Kang Liu , and Jun Zhao . 2015 . Recurrent convolutional neural networks for text classification . In Proceedings of the 29th AAAI Conference on Artificial Intelligence. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_135_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.10.065"},{"key":"e_1_2_1_136_1","volume-title":"Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA\u201917)","author":"Kowsari Kamran","unstructured":"Kamran Kowsari , Donald E. Brown , Mojtaba Heidarysafa , Kiana Jafari Meimandi , Matthew S. Gerber , and Laura E. Barnes . 2017. Hdltex: Hierarchical deep learning for text classification . In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA\u201917) . IEEE, 364--371. Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber, and Laura E. Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA\u201917). IEEE, 364--371."},{"key":"e_1_2_1_137_1","unstructured":"Xiaodong Liu Yelong Shen Kevin Duh and Jianfeng Gao. 2017. Stochastic answer networks for machine reading comprehension. Retrieved from https:\/\/arXiv:1712.03556.  Xiaodong Liu Yelong Shen Kevin Duh and Jianfeng Gao. 2017. Stochastic answer networks for machine reading comprehension. Retrieved from https:\/\/arXiv:1712.03556."},{"key":"e_1_2_1_138_1","unstructured":"Rupesh Srivastava Klaus Greff and J\u00fcrgen Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. Retrieved from https:\/\/arxiv:1507.06228.  Rupesh Srivastava Klaus Greff and J\u00fcrgen Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. Retrieved from https:\/\/arxiv:1507.06228."},{"key":"e_1_2_1_139_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_140_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916)","author":"Kim Yoon","unstructured":"Yoon Kim , Yacine Jernite , David Sontag , and Alexander M. Rush . 2016. Character-Aware neural language models . In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916) . Retrieved from https:\/\/arxiv:1508.06615. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-Aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916). Retrieved from https:\/\/arxiv:1508.06615."},{"key":"e_1_2_1_141_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u201917)","author":"Zilly Julian Georg","year":"2017","unstructured":"Julian Georg Zilly , Rupesh Kumar Srivastava , Jan Koutnik , and J\u00fcrgen Schmidhuber . 2017 . Recurrent highway networks . In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) . Retrieved from https:\/\/arxiv:1607.03474. Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutnik, and J\u00fcrgen Schmidhuber. 2017. Recurrent highway networks. In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917). Retrieved from https:\/\/arxiv:1607.03474."},{"key":"e_1_2_1_142_1","unstructured":"Ying Wen Weinan Zhang Rui Luo and Jun Wang. 2016. Learning text representation using recurrent convolutional neural network with highway layers. Retrieved from https:\/\/arXiv:1606.06905.  Ying Wen Weinan Zhang Rui Luo and Jun Wang. 2016. Learning text representation using recurrent convolutional neural network with highway layers. Retrieved from https:\/\/arXiv:1606.06905."},{"key":"e_1_2_1_143_1","volume-title":"Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Aug","author":"Collobert Ronan","year":"2011","unstructured":"Ronan Collobert , Jason Weston , L\u00e9on Bottou , Michael Karlen , Koray Kavukcuoglu , and Pavel Kuksa . 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Aug . 2011 ), 2493--2537. Ronan Collobert, Jason Weston, L\u00e9on Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Aug. 2011), 2493--2537."},{"key":"e_1_2_1_144_1","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford , Jeffrey Wu , Rewon Child , David Luan , Dario Amodei , and Ilya Sutskever . 2019 . Language models are unsupervised multitask learners . OpenAI Blog 1 , 8 (2019), 9 . Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_2_1_145_1","unstructured":"Xipeng Qiu Tianxiang Sun Yige Xu Yunfan Shao Ning Dai and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Retrieved from https:\/\/arXiv:2003.08271.  Xipeng Qiu Tianxiang Sun Yige Xu Yunfan Shao Ning Dai and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Retrieved from https:\/\/arXiv:2003.08271."},{"key":"e_1_2_1_146_1","volume-title":"Roberta: A robustly optimized bert pretraining approach.","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019 . Roberta: A robustly optimized bert pretraining approach. Retrieved from https:\/\/arXiv:1907.11692. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. Retrieved from https:\/\/arXiv:1907.11692."},{"key":"e_1_2_1_147_1","volume-title":"Albert: A lite bert for self-supervised learning of language representations.","author":"Lan Zhenzhong","year":"2019","unstructured":"Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , and Radu Soricut . 2019 . Albert: A lite bert for self-supervised learning of language representations. Retrieved from https:\/\/arXiv:1909.11942. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. Retrieved from https:\/\/arXiv:1909.11942."},{"key":"e_1_2_1_148_1","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. Retrieved from https:\/\/arXiv:1910.01108.  Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. Retrieved from https:\/\/arXiv:1910.01108."},{"key":"e_1_2_1_149_1","volume-title":"Spanbert: Improving pre-training by representing and predicting spans.","author":"Joshi Mandar","year":"2019","unstructured":"Mandar Joshi , Danqi Chen , Yinhan Liu , Daniel S. Weld , Luke Zettlemoyer , and Omer Levy . 2019 . Spanbert: Improving pre-training by representing and predicting spans. Retrieved from https:\/\/arXiv:1907.10529. Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2019. Spanbert: Improving pre-training by representing and predicting spans. Retrieved from https:\/\/arXiv:1907.10529."},{"key":"e_1_2_1_150_1","volume-title":"Electra: Pre-training text encoders as discriminators rather than generators.","author":"Clark Kevin","year":"2020","unstructured":"Kevin Clark , Minh-Thang Luong , Quoc V. Le , and Christopher D Manning . 2020 . Electra: Pre-training text encoders as discriminators rather than generators. Retrieved from https:\/\/arXiv:2003.10555. Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. Retrieved from https:\/\/arXiv:2003.10555."},{"key":"e_1_2_1_151_1","volume-title":"Ernie: Enhanced representation through knowledge integration.","author":"Sun Yu","year":"2019","unstructured":"Yu Sun , Shuohuan Wang , Yukun Li , Shikun Feng , Xuyi Chen , Han Zhang , Xin Tian , Danxiang Zhu , Hao Tian , and Hua Wu . 2019 . Ernie: Enhanced representation through knowledge integration. Retrieved from https:\/\/arXiv:1904.09223. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. Retrieved from https:\/\/arXiv:1904.09223."},{"key":"e_1_2_1_152_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6428"},{"key":"e_1_2_1_153_1","volume-title":"TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection.","author":"Garg Siddhant","year":"2019","unstructured":"Siddhant Garg , Thuy Vu , and Alessandro Moschitti . 2019 . TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection. Retrieved from https:\/\/arXiv:1911.04118. Siddhant Garg, Thuy Vu, and Alessandro Moschitti. 2019. TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection. Retrieved from https:\/\/arXiv:1911.04118."},{"key":"e_1_2_1_154_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-32381-3_16"},{"key":"e_1_2_1_155_1","unstructured":"Zhuosheng Zhang Yuwei Wu Hai Zhao Zuchao Li Shuailiang Zhang Xi Zhou and Xiang Zhou. 2019. Semantics-aware BERT for language understanding. Retrieved from https:\/\/arXiv:1909.02209.  Zhuosheng Zhang Yuwei Wu Hai Zhao Zuchao Li Shuailiang Zhang Xi Zhou and Xiang Zhou. 2019. Semantics-aware BERT for language understanding. Retrieved from https:\/\/arXiv:1909.02209."},{"key":"e_1_2_1_156_1","volume-title":"Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang , Zihang Dai , Yiming Yang , Jaime Carbonell , Russ R Salakhutdinov , and Quoc V Le . 2019 . Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems . MIT Press , 5754--5764. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. MIT Press, 5754--5764."},{"key":"e_1_2_1_157_1","volume-title":"Advances in Neural Information Processing Systems","author":"Dong Li","unstructured":"Li Dong , Nan Yang , Wenhui Wang , Furu Wei , Xiaodong Liu , Yu Wang , Jianfeng Gao , Ming Zhou , and Hsiao-Wuen Hon . 2019. Unified language model pre-training for natural language understanding and generation . In Advances in Neural Information Processing Systems . MIT Press , 13042--13054. Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems. MIT Press, 13042--13054."},{"key":"e_1_2_1_158_1","volume-title":"Ming Zhou et\u00a0al","author":"Bao Hangbo","year":"2020","unstructured":"Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Songhao Piao , Jianfeng Gao , Ming Zhou et\u00a0al . 2020 . UniLMv2: Pseudo-masked language models for unified language model pre-training. Retrieved from https:\/\/arXiv:2002.12804. Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou et\u00a0al. 2020. UniLMv2: Pseudo-masked language models for unified language model pre-training. Retrieved from https:\/\/arXiv:2002.12804."},{"key":"e_1_2_1_159_1","volume-title":"Liu","author":"Raffel Colin","year":"2019","unstructured":"Colin Raffel , Noam Shazeer , Adam Roberts , Katherine Lee , Sharan Narang , Michael Matena , Yanqi Zhou , Wei Li , and Peter J . Liu . 2019 . Exploring the limits of transfer learning with a unified text-to-text transformer. Retrieved from https:\/\/arXiv:1910.10683. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. Retrieved from https:\/\/arXiv:1910.10683."},{"key":"e_1_2_1_160_1","volume-title":"Williams","author":"Rumelhart David E.","year":"1985","unstructured":"David E. Rumelhart , Geoffrey E. Hinton , and Ronald J . Williams . 1985 . Learning Internal Representations by Error Propagation. Technical Report. University of California San Diego, La Jolla Institute for Cognitive Science . David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1985. Learning Internal Representations by Error Propagation. Technical Report. University of California San Diego, La Jolla Institute for Cognitive Science."},{"key":"e_1_2_1_161_1","volume-title":"Advances in Neural Information Processing Systems","author":"Kiros Ryan","unstructured":"Ryan Kiros , Yukun Zhu , Russ R. Salakhutdinov , Richard Zemel , Raquel Urtasun , Antonio Torralba , and Sanja Fidler . 2015. Skip-thought vectors . In Advances in Neural Information Processing Systems . MIT Press , 3294--3302. Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems. MIT Press, 3294--3302."},{"key":"e_1_2_1_162_1","volume-title":"Le","author":"Dai Andrew M.","year":"2015","unstructured":"Andrew M. Dai and Quoc V . Le . 2015 . Semi-supervised sequence learning. In Advances in Neural Information Processing Systems. Retrieved from https:\/\/arxiv:1511.01432. Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Advances in Neural Information Processing Systems. Retrieved from https:\/\/arxiv:1511.01432."},{"key":"e_1_2_1_163_1","doi-asserted-by":"crossref","unstructured":"Minghua Zhang Yunfang Wu Weikang Li and Wei Li. 2019. Learning universal sentence representations with mean-max attention autoencoder. DOI:http:\/\/dx.doi.org\/10.18653\/v1\/d18-1481 Retrieved from https:\/\/arxiv:1809.06590.  Minghua Zhang Yunfang Wu Weikang Li and Wei Li. 2019. Learning universal sentence representations with mean-max attention autoencoder. DOI:http:\/\/dx.doi.org\/10.18653\/v1\/d18-1481 Retrieved from https:\/\/arxiv:1809.06590.","DOI":"10.18653\/v1\/D18-1481"},{"key":"e_1_2_1_164_1","volume-title":"Proceedings of the 2nd International Conference on Learning Representations (ICLR\u201914)","author":"Diederik","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes . In Proceedings of the 2nd International Conference on Learning Representations (ICLR\u201914) . arxiv:1312.6114 Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR\u201914). arxiv:1312.6114"},{"key":"e_1_2_1_165_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201914)","author":"Rezende Danilo Jimenez","year":"2014","unstructured":"Danilo Jimenez Rezende , Shakir Mohamed , and Daan Wierstra . 2014 . Stochastic backpropagation and approximate inference in deep generative models . Proceedings of the International Conference on Machine Learning (ICML\u201914) . Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the International Conference on Machine Learning (ICML\u201914)."},{"key":"e_1_2_1_166_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Miao Yishu","year":"2016","unstructured":"Yishu Miao , Lei Yu , and Phil Blunsom . 2016 . Neural variational inference for text processing . In Proceedings of the International Conference on Machine Learning. Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_2_1_167_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K16-1002"},{"key":"e_1_2_1_168_1","doi-asserted-by":"crossref","unstructured":"Suchin Gururangan Tam Dang Dallas Card and Noah A Smith. 2019. Variational pretraining for semi-supervised text classification. Retrieved from https:\/\/arXiv:1906.02242.  Suchin Gururangan Tam Dang Dallas Card and Noah A Smith. 2019. Variational pretraining for semi-supervised text classification. Retrieved from https:\/\/arXiv:1906.02242.","DOI":"10.18653\/v1\/P19-1590"},{"key":"e_1_2_1_169_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271737"},{"key":"e_1_2_1_170_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.194"},{"key":"e_1_2_1_171_1","unstructured":"Ian J. Goodfellow Jonathon Shlens and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. Retrieved from https:\/\/arXiv:1412.6572.  Ian J. Goodfellow Jonathon Shlens and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. Retrieved from https:\/\/arXiv:1412.6572."},{"key":"e_1_2_1_172_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201916)","author":"Miyato Takeru","year":"2016","unstructured":"Takeru Miyato , Shin-ichi Maeda, Masanori Koyama , Ken Nakae , and Shin Ishii . 2016 . Distributional smoothing with virtual adversarial training . In Proceedings of the International Conference on Learning Representations (ICLR\u201916) . Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, and Shin Ishii. 2016. Distributional smoothing with virtual adversarial training. In Proceedings of the International Conference on Learning Representations (ICLR\u201916)."},{"key":"e_1_2_1_173_1","unstructured":"Takeru Miyato Andrew M. Dai and Ian Goodfellow. 2016. Adversarial training methods for semi-supervised text classification. Retrieved from https:\/\/arXiv:1605.07725.  Takeru Miyato Andrew M. Dai and Ian Goodfellow. 2016. Adversarial training methods for semi-supervised text classification. Retrieved from https:\/\/arXiv:1605.07725."},{"key":"e_1_2_1_174_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016940"},{"key":"e_1_2_1_175_1","unstructured":"Pengfei Liu Xipeng Qiu and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. Retrieved from https:\/\/arXiv:1704.05742.  Pengfei Liu Xipeng Qiu and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. Retrieved from https:\/\/arXiv:1704.05742."},{"key":"e_1_2_1_176_1","volume-title":"Barto","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G . Barto . 2018 . Reinforcement Learning : An Introduction. MIT Press . Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_2_1_177_1","doi-asserted-by":"crossref","unstructured":"Tao Shen Tianyi Zhou Guodong Long Jing Jiang Sen Wang and Chengqi Zhang. 2018. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. Retrieved from https:\/\/arXiv:1801.10296.  Tao Shen Tianyi Zhou Guodong Long Jing Jiang Sen Wang and Chengqi Zhang. 2018. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. Retrieved from https:\/\/arXiv:1801.10296.","DOI":"10.24963\/ijcai.2018\/604"},{"key":"e_1_2_1_178_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.08.082"},{"key":"e_1_2_1_179_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098177"},{"key":"e_1_2_1_180_1","first-page":"301","article-title":"A generative model for category text generation. Info","volume":"450","author":"Li Yang","year":"2018","unstructured":"Yang Li , Quan Pan , Suhang Wang , Tao Yang , and Erik Cambria . 2018 . A generative model for category text generation. Info . Sci. 450 (2018), 301 -- 315 . Yang Li, Quan Pan, Suhang Wang, Tao Yang, and Erik Cambria. 2018. A generative model for category text generation. Info. Sci. 450 (2018), 301--315.","journal-title":"Sci."},{"key":"e_1_2_1_181_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence.","author":"Zhang Tianyang","year":"2018","unstructured":"Tianyang Zhang , Minlie Huang , and Li Zhao . 2018 . Learning structured representation for text classification via reinforcement learning . In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Tianyang Zhang, Minlie Huang, and Li Zhao. 2018. Learning structured representation for text classification via reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_182_1","doi-asserted-by":"crossref","unstructured":"Yu Gu Robert Tinn Hao Cheng Michael Lucas Naoto Usuyama Xiaodong Liu Tristan Naumann Jianfeng Gao and Hoifung Poon. 2020. Domain-specific language model pretraining for biomedical natural language processing. Retrieved from https:\/\/arXiv:2007.15779.  Yu Gu Robert Tinn Hao Cheng Michael Lucas Naoto Usuyama Xiaodong Liu Tristan Naumann Jianfeng Gao and Hoifung Poon. 2020. Domain-specific language model pretraining for biomedical natural language processing. Retrieved from https:\/\/arXiv:2007.15779.","DOI":"10.1145\/3458754"},{"key":"e_1_2_1_183_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.202"},{"key":"e_1_2_1_184_1","unstructured":"Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova and Jimmy Lin. 2019. Distilling task-specific knowledge from BERT into simple neural networks. Retrieved from https:\/\/arXiv:1903.12136.  Raphael Tang Yao Lu Linqing Liu Lili Mou Olga Vechtomova and Jimmy Lin. 2019. Distilling task-specific knowledge from BERT into simple neural networks. Retrieved from https:\/\/arXiv:1903.12136."},{"key":"e_1_2_1_185_1","unstructured":"kaggle.[n. d.]. Retrieved from https:\/\/www.kaggle.com\/yelp-dataset\/yelp-dataset.  kaggle.[n. d.]. Retrieved from https:\/\/www.kaggle.com\/yelp-dataset\/yelp-dataset."},{"key":"e_1_2_1_186_1","unstructured":"kaggle. [n. d.]. Retrieved from https:\/\/www.kaggle.com\/lakshmi25npathi\/imdb-dataset-of-50k-movie-reviews.  kaggle. [n. d.]. Retrieved from https:\/\/www.kaggle.com\/lakshmi25npathi\/imdb-dataset-of-50k-movie-reviews."},{"key":"e_1_2_1_187_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118704"},{"key":"e_1_2_1_188_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1146"},{"key":"e_1_2_1_189_1","unstructured":"kaggle. [n.d.]. Retrieved from https:\/\/www.kaggle.com\/datafiniti\/consumer-reviews-of-amazon-products.  kaggle. [n.d.]. Retrieved from https:\/\/www.kaggle.com\/datafiniti\/consumer-reviews-of-amazon-products."},{"key":"e_1_2_1_190_1","unstructured":"20\n    Newsgroups. [n.d.]. Retrieved from http:\/\/qwone.com\/jason\/20Newsgroups\/.  20 Newsgroups. [n.d.]. Retrieved from http:\/\/qwone.com\/jason\/20Newsgroups\/."},{"key":"e_1_2_1_191_1","unstructured":"Reuters. [n.d.]. Retrieved from https:\/\/martin-thoma.com\/nlp-reuters.  Reuters. [n.d.]. Retrieved from https:\/\/martin-thoma.com\/nlp-reuters."},{"key":"e_1_2_1_192_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2662067"},{"key":"e_1_2_1_193_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143892"},{"key":"e_1_2_1_194_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242610"},{"key":"e_1_2_1_195_1","doi-asserted-by":"publisher","DOI":"10.3233\/SW-140134"},{"key":"e_1_2_1_196_1","unstructured":"Ohsumed. [n.d.]. Retrieved from http:\/\/davis.wpi.edu\/xmdv\/datasets\/ohsumed.html.  Ohsumed. [n.d.]. Retrieved from http:\/\/davis.wpi.edu\/xmdv\/datasets\/ohsumed.html."},{"key":"e_1_2_1_197_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87481-2_4"},{"key":"e_1_2_1_198_1","doi-asserted-by":"crossref","unstructured":"Zhiyong Lu. 2011. PubMed and beyond: A survey of web tools for searching biomedical literature. Retrieved from https:\/\/pubmed.ncbi.nlm.nih.gov\/21245076\/.  Zhiyong Lu. 2011. PubMed and beyond: A survey of web tools for searching biomedical literature. Retrieved from https:\/\/pubmed.ncbi.nlm.nih.gov\/21245076\/.","DOI":"10.1093\/database\/baq036"},{"key":"e_1_2_1_199_1","unstructured":"Franck Dernoncourt and Ji Young Lee. 2017. Pubmed 200k rct: A dataset for sequential sentence classification in medical abstracts. Retrieved from https:\/\/arXiv:1710.06071.  Franck Dernoncourt and Ji Young Lee. 2017. Pubmed 200k rct: A dataset for sequential sentence classification in medical abstracts. Retrieved from https:\/\/arXiv:1710.06071."},{"key":"e_1_2_1_200_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2084"},{"key":"e_1_2_1_201_1","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Robin Jia and Percy Liang. 2018. Know what you don\u2019t know: Unanswerable questions for SQuAD. Retrieved from https:\/\/arXiv preprint:1806.03822.  Pranav Rajpurkar Robin Jia and Percy Liang. 2018. Know what you don\u2019t know: Unanswerable questions for SQuAD. Retrieved from https:\/\/arXiv preprint:1806.03822.","DOI":"10.18653\/v1\/P18-2124"},{"key":"e_1_2_1_202_1","volume-title":"MS MARCO: A human-generated machine reading comprehension dataset. CoCo@ NIPS.","author":"Nguyen Tri","year":"2016","unstructured":"Tri Nguyen , Mir Rosenberg , Xia Song , Jianfeng Gao , Saurabh Tiwary , Rangan Majumder , and Li Deng . 2016 . MS MARCO: A human-generated machine reading comprehension dataset. CoCo@ NIPS. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human-generated machine reading comprehension dataset. CoCo@ NIPS."},{"key":"e_1_2_1_203_1","unstructured":"University of Pennsylvania [n.d.]. Retrieved from https:\/\/cogcomp.seas.upenn.edu\/Data\/QA\/QC\/.  University of Pennsylvania [n.d.]. Retrieved from https:\/\/cogcomp.seas.upenn.edu\/Data\/QA\/QC\/."},{"key":"e_1_2_1_204_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1237"},{"key":"e_1_2_1_205_1","unstructured":"Quora. [n.d.]. Retrieved from https:\/\/data.quora.com\/First-Quora-Dataset-Release-QuestionPairs.  Quora. [n.d.]. Retrieved from https:\/\/data.quora.com\/First-Quora-Dataset-Release-QuestionPairs."},{"key":"e_1_2_1_206_1","volume-title":"Swag: A large-scale adversarial dataset for grounded commonsense inference.","author":"Zellers Rowan","year":"2018","unstructured":"Rowan Zellers , Yonatan Bisk , Roy Schwartz , and Yejin Choi . 2018 . Swag: A large-scale adversarial dataset for grounded commonsense inference. Retrieved from https:\/\/arXiv:1808.05326. Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. Swag: A large-scale adversarial dataset for grounded commonsense inference. Retrieved from https:\/\/arXiv:1808.05326."},{"key":"e_1_2_1_207_1","volume-title":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 820--827","author":"Jurczyk Tomasz","unstructured":"Tomasz Jurczyk , Michael Zhai , and Jinho D. Choi . 2016. Selqa: A new benchmark for selection-based question answering . In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 820--827 . Tomasz Jurczyk, Michael Zhai, and Jinho D. Choi. 2016. Selqa: A new benchmark for selection-based question answering. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 820--827."},{"key":"e_1_2_1_208_1","volume-title":"Manning","author":"Bowman Samuel R.","year":"2015","unstructured":"Samuel R. Bowman , Gabor Angeli , Christopher Potts , and Christopher D . Manning . 2015 . A large annotated corpus for learning natural language inference. Retrieved from https:\/\/arXiv:1508.05326. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. Retrieved from https:\/\/arXiv:1508.05326."},{"key":"e_1_2_1_209_1","doi-asserted-by":"crossref","unstructured":"Adina Williams Nikita Nangia and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. Retrieved from https:\/\/arXiv:1704.05426.  Adina Williams Nikita Nangia and Samuel R Bowman. 2017. A broad-coverage challenge corpus for sentence understanding through inference. Retrieved from https:\/\/arXiv:1704.05426.","DOI":"10.18653\/v1\/N18-1101"},{"key":"e_1_2_1_210_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220406"},{"key":"e_1_2_1_211_1","unstructured":"Daniel Cer Mona Diab Eneko Agirre Inigo Lopez-Gazpio and Lucia Specia. 2017. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. Retrieved from https:\/\/arXiv:1708.00055.  Daniel Cer Mona Diab Eneko Agirre Inigo Lopez-Gazpio and Lucia Specia. 2017. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. Retrieved from https:\/\/arXiv:1708.00055."},{"key":"e_1_2_1_212_1","series-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http:\/\/dx.doi.org\/10.1007\/11736790_9","volume-title":"The PASCAL recognising textual entailment challenge","author":"Dagan Ido","unstructured":"Ido Dagan , Oren Glickman , and Bernardo Magnini . 2006. The PASCAL recognising textual entailment challenge . In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http:\/\/dx.doi.org\/10.1007\/11736790_9 Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). DOI:http:\/\/dx.doi.org\/10.1007\/11736790_9"},{"key":"e_1_2_1_213_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918)","author":"Khot Tushar","year":"2018","unstructured":"Tushar Khot , Ashish Sabharwal , and Peter Clark . 2018 . Scitail: A textual entailment dataset from science question answering . In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918) . Tushar Khot, Ashish Sabharwal, and Peter Clark. 2018. Scitail: A textual entailment dataset from science question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918)."},{"key":"e_1_2_1_214_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002491"},{"key":"e_1_2_1_215_1","volume-title":"Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media.","author":"Martineau Justin Christopher","year":"2009","unstructured":"Justin Christopher Martineau and Tim Finin . 2009 . Delta tfidf: An improved feature space for sentiment analysis . In Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media. Justin Christopher Martineau and Tim Finin. 2009. Delta tfidf: An improved feature space for sentiment analysis. In Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media."},{"key":"e_1_2_1_216_1","doi-asserted-by":"crossref","unstructured":"Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. Retrieved from https:\/\/arXiv:1801.06146.  Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. Retrieved from https:\/\/arXiv:1801.06146.","DOI":"10.18653\/v1\/P18-1031"},{"key":"e_1_2_1_217_1","volume-title":"Advances in Neural Information Processing Systems","author":"McCann Bryan","unstructured":"Bryan McCann , James Bradbury , Caiming Xiong , and Richard Socher . 2017. Learned in translation: Contextualized word vectors . In Advances in Neural Information Processing Systems . MIT Press , 6294--6305. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. MIT Press, 6294--6305."},{"key":"e_1_2_1_218_1","volume-title":"Kingma","author":"Gray Scott","year":"2017","unstructured":"Scott Gray , Alec Radford , and Diederik P . Kingma . 2017 . Gpu kernels for block-sparse weights. Retrieved from https:\/\/arXiv:1711.09224. Scott Gray, Alec Radford, and Diederik P. Kingma. 2017. Gpu kernels for block-sparse weights. Retrieved from https:\/\/arXiv:1711.09224."},{"key":"e_1_2_1_219_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33014763"},{"key":"e_1_2_1_220_1","unstructured":"Qizhe Xie Zihang Dai Eduard Hovy Minh-Thang Luong and Quoc V Le. 2019. Unsupervised data augmentation. Retrieved from https:\/\/arXiv:1904.12848.  Qizhe Xie Zihang Dai Eduard Hovy Minh-Thang Luong and Quoc V Le. 2019. Unsupervised data augmentation. Retrieved from https:\/\/arXiv:1904.12848."},{"key":"e_1_2_1_221_1","volume-title":"Proceedings of the International Conference on Machine Learning. 957--966","author":"Kusner Matt","year":"2015","unstructured":"Matt Kusner , Yu Sun , Nicholas Kolkin , and Kilian Weinberger . 2015 . From word embeddings to document distances . In Proceedings of the International Conference on Machine Learning. 957--966 . Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. 957--966."},{"key":"e_1_2_1_222_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. 193--203","author":"Richardson Matthew","year":"2013","unstructured":"Matthew Richardson , Christopher J. C. Burges , and Erin Renshaw . 2013 . Mctest: A challenge dataset for the open-domain machine comprehension of text . In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 193--203 . Matthew Richardson, Christopher J. C. Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 193--203."},{"key":"e_1_2_1_223_1","volume-title":"Fusionnet: Fusing via fully-aware attention with application to machine comprehension.","author":"Huang Hsin-Yuan","year":"2017","unstructured":"Hsin-Yuan Huang , Chenguang Zhu , Yelong Shen , and Weizhu Chen . 2017 . Fusionnet: Fusing via fully-aware attention with application to machine comprehension. Retrieved from https:\/\/arXiv:1711.07341. Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. 2017. Fusionnet: Fusing via fully-aware attention with application to machine comprehension. Retrieved from https:\/\/arXiv:1711.07341."},{"key":"e_1_2_1_224_1","doi-asserted-by":"crossref","unstructured":"Qian Chen Xiaodan Zhu Zhen-Hua Ling Si Wei Hui Jiang and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. Retrieved from https:\/\/arXiv:1708.01353  Qian Chen Xiaodan Zhu Zhen-Hua Ling Si Wei Hui Jiang and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. Retrieved from https:\/\/arXiv:1708.01353","DOI":"10.18653\/v1\/W17-5307"},{"key":"e_1_2_1_225_1","unstructured":"Boyuan Pan Yazheng Yang Zhou Zhao Yueting Zhuang Deng Cai and Xiaofei He. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. Retrieved from https:\/\/arXiv:1907.09692.  Boyuan Pan Yazheng Yang Zhou Zhao Yueting Zhuang Deng Cai and Xiaofei He. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. Retrieved from https:\/\/arXiv:1907.09692."},{"key":"e_1_2_1_226_1","unstructured":"Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang and Ming Zhou. 2020. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Retrieved from https:\/\/arXiv:2002.10957.  Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang and Ming Zhou. 2020. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Retrieved from https:\/\/arXiv:2002.10957."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3439726","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3439726","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:52Z","timestamp":1750197712000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3439726"}},"subtitle":["A Comprehensive Review"],"short-title":[],"issued":{"date-parts":[[2021,4,17]]},"references-count":226,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,4,30]]}},"alternative-id":["10.1145\/3439726"],"URL":"https:\/\/doi.org\/10.1145\/3439726","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,17]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}