{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:13:20Z","timestamp":1760242400243,"version":"build-2065373602"},"reference-count":57,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2017,6,12]],"date-time":"2017-06-12T00:00:00Z","timestamp":1497225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document\u2013summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document\u2013summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value.<\/jats:p>","DOI":"10.3390\/info8020064","type":"journal-article","created":{"date-parts":[[2017,6,12]],"date-time":"2017-06-12T10:27:59Z","timestamp":1497263279000},"page":"64","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Identifying High Quality Document\u2013Summary Pairs through Text Matching"],"prefix":"10.3390","volume":"8","author":[{"given":"Yongshuai","family":"Hou","sequence":"first","affiliation":[{"name":"Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Xiang","sequence":"additional","affiliation":[{"name":"Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Buzhou","family":"Tang","sequence":"additional","affiliation":[{"name":"Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qingcai","family":"Chen","sequence":"additional","affiliation":[{"name":"Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaolong","family":"Wang","sequence":"additional","affiliation":[{"name":"Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fangze","family":"Zhu","sequence":"additional","affiliation":[{"name":"Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,6,12]]},"reference":[{"key":"ref_1","unstructured":"Hovy, E., and Lin, C.Y. (1998, January 13\u201315). Automated Text Summarization and the SUMMARIST System. Proceedings of the Workshop on TIPSTER\u201998, Baltimore, MD, USA."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10462-016-9475-9","article-title":"Recent automatic text summarization techniques: A survey","volume":"47","author":"Gambhir","year":"2017","journal-title":"Artif. Intell. Rev."},{"key":"ref_3","first-page":"21","article-title":"Trends in extractive and abstractive techniques in text summarization","volume":"117","author":"Bhatia","year":"2015","journal-title":"Int. J. Comput. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hu, B., Chen, Q., and Zhu, F. (2015, January 17\u201321). LCSTS: A Large Scale Chinese Short Text Summarization Dataset. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1229"},{"key":"ref_5","unstructured":"Hu, B., Lu, Z., Li, H., and Chen, Q. (arXiv, 2015). Convolutional Neural Network Architectures for Matching Natural Language Sentences, arXiv."},{"key":"ref_6","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (arXiv, 2014). Neural Machine Translation by Jointly Learning to Align and Translate, arXiv."},{"key":"ref_7","unstructured":"Yu, L., Hermann, K.M., Blunsom, P., and Pulman, S. (arXiv, 2014). Deep Learning for Answer Sentence Selection, arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhou, X., Hu, B., Chen, Q., Tang, B., and Wang, X. (arXiv, 2015). Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering, arXiv.","DOI":"10.3115\/v1\/P15-2117"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhou, X., Hu, B., Chen, Q., and Wang, X. (2015, January 9\u201312). An Auto-Encoder for Learning Conversation Representation Using LSTM. Proceedings of the International Conference on Neural Information Processing, Istanbul, Turkey.","DOI":"10.1007\/978-3-319-26532-2_34"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, P., Qiu, X., Chen, J., and Huang, X. (2016, January 7\u201312). Deep Fusion LSTMs for Text Semantic Matching. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1098"},{"key":"ref_11","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (arXiv, 2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tang, D., Qin, B., and Liu, T. (2015, January 17\u201321). Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1167"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_14","first-page":"192","article-title":"A survey on automatic text summarization","volume":"4","author":"Das","year":"2007","journal-title":"Lit. Surv. Lang. Stat. II Course CMU"},{"key":"ref_15","first-page":"7889","article-title":"A survey on automatic text summarization","volume":"5","author":"Saranyamol","year":"2014","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Schluter, N., and S\u00f8gaard, A. (2015, January 26\u201331). Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China.","DOI":"10.3115\/v1\/P15-2138"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., and Wang, H. (2015, January 26\u201331). Learning Summary Prior Representation for Extractive Summarization. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language, Beijing, China.","DOI":"10.3115\/v1\/P15-2136"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yogatama, D., Liu, F., and Smith, N.A. (2015, January 17\u201321). Extractive Summarization by Maximizing Semantic Volume. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1228"},{"key":"ref_19","unstructured":"Ganesan, K., Zhai, C., and Han, J. (2010, January 23\u201327). Opinosis: A Graph-based Approach to Abstractive Summarization of Highly Redundant Opinions. Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, W. (2015, January 17\u201321). Abstractive Multi-document Summarization with Semantic Information Extraction. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1219"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Nallapati, R., Zhou, B., Gulcehre, C., and Xiang, B. (arXiv, 2016). Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond, arXiv.","DOI":"10.18653\/v1\/K16-1028"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fuentes, M., Alfonseca, E., and Rodr\u00edguez, H. (2007, January 25\u201327). Support vector machines for query-focused summarization trained and evaluated on pyramid data. Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions; Association for Computational Linguistics, Prague, Czech Republic.","DOI":"10.3115\/1557769.1557788"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wong, K.F., Wu, M., and Li, W. (2008, January 18\u201322). Extractive summarization using supervised and semi-supervised learning. Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1; Association for Computational Linguistics, Manchester, UK.","DOI":"10.3115\/1599081.1599205"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1613\/jair.1523","article-title":"Lexrank: Graph-based lexical centrality as salience in text summarization","volume":"22","author":"Erkan","year":"2004","journal-title":"J. Artif. Intell. Res."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Mihalcea, R. (2004, January 21\u201326). Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACLdemo\u201904, Barcelona, Spain.","DOI":"10.3115\/1219044.1219064"},{"key":"ref_26","unstructured":"Hatzivassiloglou, V., Klavans, J.L., Holcombe, M.L., Barzilay, R., Kan, M.Y., and McKeown, K.R. (2001, January 2\u20137). Simfinder: A flexible clustering tool for summarization. Proceedings of the NAACL Workshop on Automatic Summarization, Pittsburgh, PA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Llewellyn, C., Grover, C., and Oberlander, J. (2016, January 7\u201312). Improving Topic Model Clustering of Newspaper Comments for Summarisation. Proceedings of the ACL 2016 Student Research Workshop, Berlin, Germany.","DOI":"10.18653\/v1\/P16-3007"},{"key":"ref_28","unstructured":"Hochreiter, S., Bengio, Y., Frasconi, P., and Schmidhuber, J. (2001). Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies, IEEE Press."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Kim, Y. (arXiv, 2014). Convolutional neural networks for sentence classification, arXiv.","DOI":"10.3115\/v1\/D14-1181"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Rush, A.M., Chopra, S., and Weston, J. (2015, January 17\u201321). A Neural Attention Model for Abstractive Sentence Summarization. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1044"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Cheng, J., and Lapata, M. (2016, January 7\u201312). Neural Summarization by Extracting Sentences and Words. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1046"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1506","DOI":"10.1016\/j.ipm.2007.01.019","article-title":"DUC in context","volume":"43","author":"Over","year":"2007","journal-title":"Inf. Process. Manag."},{"key":"ref_33","unstructured":"Owczarzak, K., and Dang, H.T. (2011, January 14\u201315). Overview of the TAC 2011 summarization track: Guided task and AESOP task. Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, MD, USA."},{"key":"ref_34","unstructured":"Napoles, C., Gormley, M., and Van Durme, B. (2012, January 7\u20138). Annotated Gigaword. Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, Montreal, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Nakov, P., M\u00e0rquez, L., Moschitti, A., Magdy, W., Mubarak, H., Freihat, A.A., Glass, J., and Randeree, B. (2016, January 16\u201317). SemEval-2016 Task 3: Community Question Answering. Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA.","DOI":"10.18653\/v1\/S16-1083"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1145\/361219.361220","article-title":"A vector space model for automatic indexing","volume":"18","author":"Salton","year":"1975","journal-title":"Commun. ACM"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","article-title":"Indexing by latent semantic analysis","volume":"41","author":"Deerwester","year":"1990","journal-title":"J. Am. Soc. Inf. Sci."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Papadimitriou, C.H., Tamaki, H., Raghavan, P., and Vempala, S. (1998, January 1\u20134). Latent Semantic Indexing: A Probabilistic Analysis. Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, New York, NY, USA.","DOI":"10.1145\/275487.275505"},{"key":"ref_39","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1007\/BF02163027","article-title":"Singular value decomposition and least squares solutions","volume":"14","author":"Golub","year":"1970","journal-title":"Numer. Math."},{"key":"ref_41","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (arXiv, 2013). Efficient Estimation of Word Representations in Vector Space, arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Poria, S., Cambria, E., and Gelbukh, A. (2015, January 17\u201321). Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1303"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1016\/j.knosys.2016.06.009","article-title":"Aspect extraction for opinion mining with a deep convolutional neural network","volume":"108","author":"Poria","year":"2016","journal-title":"Knowl. Based Syst."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Wang, B., Liu, K., and Zhao, J. (2016, January 7\u201312). Inner Attention based Recurrent Neural Networks for Answer Selection. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1122"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Tan, M., dos Santos, C., Xiang, B., and Zhou, B. (2016, January 7\u201312). Improved Representation Learning for Question Answer Matching. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1044"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Nie, Y., An, C., Huang, J., Yan, Z., and Han, Y. (2016, January 13\u201316). A Bidirectional LSTM Model for Question Title and Body Analysis in Question Answering. Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace, Changsha, China.","DOI":"10.1109\/DSC.2016.72"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Yao, D., Pang, Y., and Lu, X. (2015). Chinese Textual Entailment Recognition Enhanced with Word Embedding, Springer International Publishing.","DOI":"10.1007\/978-3-319-25816-4_8"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Lyu, C., Lu, Y., Ji, D., and Chen, B. (2015, January 9\u201311). Deep Learning for Textual Entailment Recognition. Proceedings of the 2015 IEEE 27th International Conference on Tools with Artificial Intelligence, Salerno, Italia.","DOI":"10.1109\/ICTAI.2015.35"},{"key":"ref_49","unstructured":"Rockt\u00e4schel, T., Grefenstette, E., Hermann, K.M., Ko\u010disk\u00fd, T., and Blunsom, P. (arXiv, 2015). Reasoning about Entailment with Neural Attention, arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support vector machines","volume":"20","author":"Vapnik","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_51","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1080\/01638539809545028","article-title":"An introduction to latent semantic analysis","volume":"25","author":"Landauer","year":"1998","journal-title":"Discourse Process."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Dahl, G.E., Sainath, T.N., and Hinton, G.E. (2013, January 26\u201331). Improving deep neural networks for LVCSR using rectified linear units and dropout. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada.","DOI":"10.1109\/ICASSP.2013.6639346"},{"key":"ref_54","unstructured":"Lin, C.Y. (2004, January 25\u201326). Rouge: A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain."},{"key":"ref_55","unstructured":"\u0158eh\u016f\u0159ek, R., and Sojka, P. (2010, January 22). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta."},{"key":"ref_56","unstructured":"(arXiv, 2016). Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions, arXiv."},{"key":"ref_57","unstructured":"Zeiler, M.D. (arXiv, 2012). ADADELTA: An Adaptive Learning Rate Method, arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/2\/64\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:38:44Z","timestamp":1760207924000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/2\/64"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,6,12]]},"references-count":57,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2017,6]]}},"alternative-id":["info8020064"],"URL":"https:\/\/doi.org\/10.3390\/info8020064","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2017,6,12]]}}}