{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T00:45:45Z","timestamp":1773967545377,"version":"3.50.1"},"reference-count":58,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2022,4,11]],"date-time":"2022-04-11T00:00:00Z","timestamp":1649635200000},"content-version":"vor","delay-in-days":100,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,4,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Standard multi-task benchmarks are essential for developing pretraining models that can generalize to various downstream tasks. Existing benchmarks for natural language processing (NLP) usually focus only on understanding or generating short texts. However, long text modeling requires many distinct abilities in contrast to short texts, such as the modeling of long-range discourse and commonsense relations, and the coherence and controllability of generation. The lack of standardized benchmarks makes it difficult to assess these abilities of a model and fairly compare different models, especially Chinese models. Therefore, we propose a story-centric benchmark named LOT for evaluating Chinese long text modeling, which aggregates two understanding tasks and two generation tasks. We construct new datasets for these tasks based on human-written Chinese stories with hundreds of words. Furthermore, we release an encoder-decoder-based Chinese long text pretraining model named LongLM with up to 1 billion parameters. We pretrain LongLM on 120G Chinese novels with two generative tasks including text infilling and conditional continuation. Extensive experiments show that LongLM outperforms similar-sized pretraining models substantially on both the understanding and generation tasks in LOT.<\/jats:p>","DOI":"10.1162\/tacl_a_00469","type":"journal-article","created":{"date-parts":[[2022,4,11]],"date-time":"2022-04-11T19:40:38Z","timestamp":1649706038000},"page":"434-451","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":15,"title":["LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation"],"prefix":"10.1162","volume":"10","author":[{"given":"Jian","family":"Guan","sequence":"first","affiliation":[{"name":"The CoAI group, DCST, China. j-guan19@mails.tsinghua.edu.cn"}]},{"given":"Zhuoer","family":"Feng","sequence":"additional","affiliation":[{"name":"The CoAI group, DCST, China. fze17@mails.tsinghua.edu.cn"}]},{"given":"Yamei","family":"Chen","sequence":"additional","affiliation":[{"name":"The CoAI group, DCST, China. chenziym4132013@163.com"}]},{"given":"Ruilin","family":"He","sequence":"additional","affiliation":[{"name":"Huawei Technologies Co., Ltd., China. heruilin@huawei.com"}]},{"given":"Xiaoxi","family":"Mao","sequence":"additional","affiliation":[{"name":"Netease Fuxi AI Lab., China. maoxiaoxi@corp.netease.com"}]},{"given":"Changjie","family":"Fan","sequence":"additional","affiliation":[{"name":"Netease Fuxi AI Lab., China. fanchangjie@corp.netease.com"}]},{"given":"Minlie","family":"Huang","sequence":"additional","affiliation":[{"name":"The CoAI group, DCST, China. aihuang@tsinghua.edu.cn"}]}],"member":"281","published-online":{"date-parts":[[2022,4,11]]},"reference":[{"key":"2022041119402908200_bib1","first-page":"1202","article-title":"Automatic extraction of social networks from literary text: A case study on alice in wonderland","volume-title":"Proceedings of the Sixth International Joint Conference on Natural Language Processing","author":"Agarwal","year":"2013"},{"key":"2022041119402908200_bib2","doi-asserted-by":"publisher","first-page":"6470","DOI":"10.18653\/v1\/2020.emnlp-main.525","article-title":"STORIUM: A Dataset and evaluation platform for machine-in-the-loop story generation","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Akoury","year":"2020"},{"key":"2022041119402908200_bib3","first-page":"352","article-title":"Learning latent personas of film characters","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Bamman","year":"2013"},{"key":"2022041119402908200_bib4","article-title":"Abductive commonsense reasoning","volume-title":"International Conference on Learning Representations","author":"Bhagavatula","year":"2019"},{"key":"2022041119402908200_bib5","doi-asserted-by":"publisher","first-page":"5277","DOI":"10.18653\/v1\/2020.emnlp-main.426","article-title":"Modeling protagonist emotions for emotion-aware storytelling","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Brahman","year":"2020"},{"key":"2022041119402908200_bib6","article-title":"Language models are few-shot learners","author":"Brown","year":"2020"},{"key":"2022041119402908200_bib7","first-page":"789","article-title":"Unsupervised learning of narrative event chains","volume-title":"Proceedings of ACL-08: HLT","author":"Chambers","year":"2008"},{"key":"2022041119402908200_bib8","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v31i1.10982","article-title":"Unsupervised learning of evolving relationships between literary characters","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Chaturvedi","year":"2017"},{"key":"2022041119402908200_bib9","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v30i1.10358","article-title":"Modeling evolving relationships between characters in literary novels","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Chaturvedi","year":"2016"},{"key":"2022041119402908200_bib10","doi-asserted-by":"publisher","first-page":"649","DOI":"10.18653\/v1\/D19-1060","article-title":"Evaluation benchmarks and learning criteria for discourse-aware sentence representations","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Chen","year":"2019"},{"key":"2022041119402908200_bib11","article-title":"Senteval: An evaluation toolkit for universal sentence representations","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Conneau","year":"2018"},{"key":"2022041119402908200_bib12","first-page":"657","article-title":"Revisiting pre-trained models for Chinese natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings","author":"Cui","year":"2020"},{"key":"2022041119402908200_bib13","doi-asserted-by":"crossref","first-page":"2978","DOI":"10.18653\/v1\/P19-1285","article-title":"Transformer-xl: Attentive language models beyond a fixed-length context","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Dai","year":"2019"},{"key":"2022041119402908200_bib14","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022041119402908200_bib15","doi-asserted-by":"crossref","first-page":"889","DOI":"10.18653\/v1\/P18-1082","article-title":"Hierarchical neural story generation","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Fan","year":"2018"},{"key":"2022041119402908200_bib16","unstructured":"Mark Alan Finlayson . 2012. Learning narrative structure from annotated folktales. Ph.D. thesis, Massachusetts Institute of Technology."},{"issue":"5","key":"2022041119402908200_bib17","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1037\/h0031619","article-title":"Measuring nominal scale agreement among many raters.","volume":"76","author":"Fleis","year":"1971","journal-title":"Psychological Bulletin"},{"key":"2022041119402908200_bib18","first-page":"1243","article-title":"Convolutional sequence to sequence learning","volume-title":"International Conference on Machine Learning","author":"Gehring","year":"2017"},{"key":"2022041119402908200_bib19","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.gem-1.10","article-title":"The GEM benchmark: Natural language generation, its evaluation and metrics","author":"Gehrmann","year":"2021","journal-title":"arXiv preprint arXiv:2102.01672"},{"key":"2022041119402908200_bib20","first-page":"2672","article-title":"Generative adversarial nets","volume-title":"Advances in Neural Information Processing Systems","author":"Goodfellow","year":"2014"},{"key":"2022041119402908200_bib21","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1162\/tacl_a_00302","article-title":"A knowledge- enhanced pretraining model for commonsense story generation","volume":"8","author":"Guan","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022041119402908200_bib22","doi-asserted-by":"publisher","first-page":"9157","DOI":"10.18653\/v1\/2020.emnlp-main.736","article-title":"UNION: An unreferenced metric for evaluating open-ended story generation","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020","author":"Guan","year":"2020"},{"key":"2022041119402908200_bib23","doi-asserted-by":"publisher","first-page":"6473","DOI":"10.1609\/aaai.v33i01.33016473","article-title":"Story ending generation with incremental encoding and commonsense knowledge","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Guan","year":"2019"},{"key":"2022041119402908200_bib24","doi-asserted-by":"publisher","first-page":"6394","DOI":"10.18653\/v1\/2021.acl-long.500","article-title":"OpenMEVA: A benchmark for evaluating open-ended story generation metrics","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Guan","year":"2021"},{"key":"2022041119402908200_bib25","doi-asserted-by":"publisher","first-page":"2430","DOI":"10.18653\/v1\/2021.findings-acl.215","article-title":"Stylized story generation with style-guided planning","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Kong","year":"2021"},{"key":"2022041119402908200_bib26","doi-asserted-by":"publisher","first-page":"66","DOI":"10.18653\/v1\/D18-2012","article-title":"Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Kudo","year":"2018"},{"key":"2022041119402908200_bib27","doi-asserted-by":"publisher","first-page":"7871","DOI":"10.18653\/v1\/2020.acl-main.703","article-title":"BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020","author":"Lewis","year":"2020"},{"key":"2022041119402908200_bib28","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v27i1.8649","article-title":"Story generation with crowdsourced plot graphs","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Li","year":"2013"},{"key":"2022041119402908200_bib29","first-page":"110","article-title":"A diversity-promoting objective function for neural conversation models","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Li","year":"2016"},{"key":"2022041119402908200_bib30","unstructured":"Chin-Yew Lin . 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74\u201381, Barcelona, Spain. Association for Computational Linguistics."},{"key":"2022041119402908200_bib31","article-title":"GLGE: A new general language generation evaluation benchmark","author":"Liu","year":"2020","journal-title":"arXiv preprint arXiv:2011.11928"},{"key":"2022041119402908200_bib32","doi-asserted-by":"publisher","first-page":"708","DOI":"10.18653\/v1\/N18-2111","article-title":"Deep dungeons and dragons: Learning character-action interactions from role-playing game transcripts","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Louis","year":"2018"},{"key":"2022041119402908200_bib33","article-title":"Pointer sentinel mixture models","author":"Merity","year":"2016","journal-title":"arXiv preprint arXiv:1609.07843"},{"key":"2022041119402908200_bib34","doi-asserted-by":"publisher","first-page":"839","DOI":"10.18653\/v1\/N16-1098","article-title":"A corpus and cloze evaluation for deeper understanding of commonsense stories","volume-title":"Proceedings of NAACL-HLT","author":"Mostafazadeh","year":"2016"},{"key":"2022041119402908200_bib35","first-page":"311","article-title":"BLEU: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th annual meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2022041119402908200_bib36","doi-asserted-by":"publisher","first-page":"5086","DOI":"10.18653\/v1\/2021.acl-long.395","article-title":"COINS: Dynamically generating COntextualized inference rules for narrative story completion","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Paul","year":"2021"},{"key":"2022041119402908200_bib37","article-title":"Improving language understanding with unsupervised learning","author":"Radford","year":"2018"},{"issue":"8","key":"2022041119402908200_bib38","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"2022041119402908200_bib39","article-title":"Compressive transformers for long-range sequence modelling","volume-title":"International Conference on Learning Representations","author":"Rae","year":"2020"},{"key":"2022041119402908200_bib40","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2022041119402908200_bib41","doi-asserted-by":"publisher","first-page":"4274","DOI":"10.18653\/v1\/2020.emnlp-main.349","article-title":"Plotmachines: Outline- conditioned generation with dynamic plot state tracking","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Rashkin","year":"2020"},{"key":"2022041119402908200_bib42","doi-asserted-by":"publisher","first-page":"4902","DOI":"10.18653\/v1\/2020.acl-main.442","article-title":"Beyond accuracy: Behavioral testing of NLP models with CheckList","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ribeiro","year":"2020"},{"key":"2022041119402908200_bib43","article-title":"Reasoning about entailment with neural attention","volume-title":"4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings","author":"Rockt\u00e4schel","year":"2016"},{"key":"2022041119402908200_bib44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/9780470689646.ch1","article-title":"Automatic keyword extraction from individual documents","volume":"1","author":"Rose","year":"2010","journal-title":"Text Mining: Applications and Theory"},{"key":"2022041119402908200_bib45","doi-asserted-by":"publisher","first-page":"4938","DOI":"10.1109\/CVPR42600.2020.00499","article-title":"Superglue: Learning feature matching with graph neural networks","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Sarlin","year":"2020"},{"key":"2022041119402908200_bib46","doi-asserted-by":"publisher","first-page":"15","DOI":"10.18653\/v1\/K17-1004","article-title":"The effect of different writing tasks on linguistic style: A case study of the ROC story cloze task","volume-title":"Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)","author":"Schwartz","year":"2017"},{"key":"2022041119402908200_bib47","doi-asserted-by":"publisher","first-page":"752","DOI":"10.18653\/v1\/P18-2119","article-title":"Tackling the story ending biases in the story cloze test","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Sharma","year":"2018"},{"key":"2022041119402908200_bib48","article-title":"Long range arena: A benchmark for efficient transformers","volume-title":"International Conference on Learning Representations","author":"Yi","year":"2020"},{"key":"2022041119402908200_bib49","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022041119402908200_bib50","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5446","article-title":"GLUE: A multi-task benchmark and analysis platform for natural language understanding","volume-title":"International Conference on Learning Representations","author":"Wang","year":"2019"},{"key":"2022041119402908200_bib51","first-page":"5233","article-title":"T-CVAE: Transformer-based conditioned variational autoencoder for story completion","volume-title":"Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019","author":"Wang","year":"2019"},{"key":"2022041119402908200_bib52","first-page":"4762","article-title":"CLUE: A Chinese language understanding evaluation benchmark","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Liang","year":"2020"},{"key":"2022041119402908200_bib53","first-page":"2831","article-title":"MEGATRON-CNTRL: Controllable story generation with external knowledge using large-scale language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020","author":"Peng","year":"2020"},{"key":"2022041119402908200_bib54","first-page":"483","article-title":"mt5: A massively multilingual pre-trained text-to-text transformer","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Xue","year":"2021"},{"key":"2022041119402908200_bib55","doi-asserted-by":"publisher","first-page":"7378","DOI":"10.1609\/aaai.v33i01.33017378","article-title":"Plan-and-write: Towards better automatic storytelling","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Yao","year":"2019"},{"key":"2022041119402908200_bib56","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.07.001","article-title":"CPM: A large-scale generative Chinese pre-trained language model","author":"Zhang","year":"2020","journal-title":"arXiv preprint arXiv:2012.00413"},{"key":"2022041119402908200_bib57","doi-asserted-by":"crossref","first-page":"654","DOI":"10.18653\/v1\/P17-1061","article-title":"Learning discourse-level diversity for neural dialog models using conditional variational autoencoders","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zhao","year":"2017"},{"key":"2022041119402908200_bib58","doi-asserted-by":"publisher","first-page":"241","DOI":"10.18653\/v1\/D19-3041","article-title":"UER: An open-source toolkit for pre-training models","author":"Zhao","year":"2019","journal-title":"EMNLP-IJCNLP 2019"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00469\/2008054\/tacl_a_00469.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00469\/2008054\/tacl_a_00469.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T13:01:58Z","timestamp":1675256518000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00469\/110537\/LOT-A-Story-Centric-Benchmark-for-Evaluating"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":58,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00469","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}