{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:28:43Z","timestamp":1771064923158,"version":"3.50.1"},"reference-count":46,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2021,1,18]],"date-time":"2021-01-18T00:00:00Z","timestamp":1610928000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.<\/jats:p>","DOI":"10.3390\/info12010041","type":"journal-article","created":{"date-parts":[[2021,1,19]],"date-time":"2021-01-19T04:55:31Z","timestamp":1611032131000},"page":"41","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["A Framework for Generating Extractive Summary from Multiple Malayalam Documents"],"prefix":"10.3390","volume":"12","author":[{"given":"K.","family":"Manju","sequence":"first","affiliation":[{"name":"Department of Computer Science, Cochin University of Science and Technology (CUSAT), Kochi 682022, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S.","family":"David Peter","sequence":"additional","affiliation":[{"name":"School of Engineering, Cochin University of Science and Technology (CUSAT), Kochi 682022, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sumam","family":"Idicula","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Cochin University of Science and Technology (CUSAT), Kochi 682022, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1147\/rd.22.0159","article-title":"The Automatic Creation of Literature Abstracts","volume":"2","author":"Luhn","year":"1958","journal-title":"IBM J. Res. Dev."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gong, Y., and Liu, X. (2001). Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis, Association for Computing Machinery.","DOI":"10.1145\/383952.383955"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/j.ipm.2010.03.005","article-title":"Applying regression models to query-focused multi-document summarization","volume":"47","author":"Ouyang","year":"2011","journal-title":"Inf. Process. Manag."},{"key":"ref_4","unstructured":"Radev, D., Blair-Goldensohn, S., and Zhang, Z. (2001, January 13\u201314). Experiments in Single and Multi-Document Summarization Using MEAD. Proceedings of the First Document Understanding Conference, New Orleans, LA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.knosys.2016.01.030","article-title":"Multi-document summarization using closed patterns","volume":"99","author":"Qiang","year":"2016","journal-title":"Knowl.-Based Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.eswa.2017.05.075","article-title":"Extractive multi-document summarization using population-based multicriteria optimization","volume":"86","author":"John","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Widjanarko, A., Kusumaningrum, R., and Surarso, B. (2018, January 6\u20137). Multi document summarization for the Indonesian language based on latent dirichlet allocation and significance sentence. Proceedings of the 2018 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia.","DOI":"10.1109\/ICOIACT.2018.8350668"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.eswa.2016.12.021","article-title":"Word-sentence co-ranking for automatic extractive text summarization","volume":"72","author":"Fang","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10462-016-9475-9","article-title":"Recent automatic text summarization techniques: A survey","volume":"47","author":"Gambhir","year":"2017","journal-title":"Artif. Intell. Rev."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Krishnaprasad, P., Sooryanarayanan, A., and Ramanujan, A. (2016, January 1\u20133). Malayalam text summarization: An extractive approach. Proceedings of the 2016 International Conference on Next Generation Intelligent Systems (ICNGIS), Kottayam, India.","DOI":"10.1109\/ICNGIS.2016.7854008"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kishore, K., Gopal, G.N., and Neethu, P. (2016, January 12\u201313). Document Summarization in Malayalam with sentence framing. Proceedings of the 2016 International Conference on Information Science (ICIS), Kochi, India.","DOI":"10.1109\/INFOSCI.2016.7845326"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kabeer, R., and Idicula, S.M. (2014, January 26\u201328). Text summarization for Malayalam documents\u2014An experience. Proceedings of the 2014 International Conference on Data Science & Engineering (ICDSE), Kochi, India.","DOI":"10.1109\/ICDSE.2014.6974627"},{"key":"ref_13","first-page":"8","article-title":"Summarization of Malayalam Document Using Relevance of Sentences","volume":"1","author":"Ajmal","year":"2015","journal-title":"Int. J. Latest Res. Eng. Technol."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1007\/s12046-019-1248-0","article-title":"A novel extractive text summarization system with self-organizing map clustering and entity recognition","volume":"45","author":"Raj","year":"2020","journal-title":"S\u0101dhan\u0101"},{"key":"ref_15","unstructured":"Manju, K., David, P.S., and Idicula Sumam, M. (2016, January 11\u201312). An extractive multi-document summarization system for Malayalam news documents. Proceedings of the 1st EAI International Conference on Computer Science and Engineering, Penang, Malaysia."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1366","DOI":"10.1016\/j.patrec.2008.02.008","article-title":"An effective sentence-extraction technique using contextual information and statistical approaches for text summarization","volume":"29","author":"Ko","year":"2008","journal-title":"Pattern Recognit. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.eswa.2017.04.054","article-title":"A Topic Modeling Based Approach to Novel Document Automatic Summarization","volume":"84","author":"Wu","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_18","unstructured":"Mihalcea, R., and Tarau, P. (2004, January 25\u201326). Textrank: Bringing order into text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1016\/j.eij.2019.11.001","article-title":"Extractive Arabic Text Summarization Using Modified PageRank Algorithm","volume":"21","author":"Elbarougy","year":"2020","journal-title":"Egypt. Inform. J."},{"key":"ref_20","unstructured":"Goldstein, J., and Carbonell, J. (1998). Summarization: Using MMR for Diversity-Based Reranking and Evaluating Summaries, Language Technologies Institute at Carnegie Mellon University. Technical Report."},{"key":"ref_21","first-page":"258","article-title":"Accountability of NLP Tools in Text Summarization for Indian Languages","volume":"64","author":"Verma","year":"2020","journal-title":"J. Sci. Res."},{"key":"ref_22","first-page":"469","article-title":"Generating Natural Language Summaries from Multiple On-Line Sources","volume":"24","author":"Radev","year":"1998","journal-title":"Comput. Linguist."},{"key":"ref_23","unstructured":"Erkan, G., and Radev, D. (2004, January 25\u201326). Lexpagerank: Prestige in multi-document text summarization. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Barrera, A., and Verma, R. (2012). Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization, Springer.","DOI":"10.1007\/978-3-642-28601-8_31"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/j.eij.2019.12.002","article-title":"Extractive multi-document text summarization based on graph independent sets","volume":"21","year":"2020","journal-title":"Egypt. Inform. J."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"De la Pe\u00f1a Sarrac\u00e9n, G.L., and Rosso, P. (2018). Automatic Text Summarization Based on Betweenness Centrality, Association for Computing Machinery.","DOI":"10.1145\/3230599.3230611"},{"key":"ref_27","unstructured":"Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web, Stanford InfoLab. Technical Report."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"102187","DOI":"10.1016\/j.ipm.2019.102187","article-title":"Karc\u0131 summarization: A simple and effective approach for automatic text summarization using Karc\u0131 entropy","volume":"57","author":"Hark","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"102088","DOI":"10.1016\/j.ipm.2019.102088","article-title":"Textual keyword extraction and summarization: State-of-the-art","volume":"56","author":"Nasar","year":"2019","journal-title":"Inf. Process. Manag."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Moratanch, N., and Chitrakala, S. (2017, January 10\u201311). A survey on extractive text summarization. Proceedings of the 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), Chennai, India.","DOI":"10.1109\/ICCCSP.2017.7944061"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Nallapati, R., Zhai, F., and Zhou, B. (2017, January 4\u20139). SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI\u201917, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.10958"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"24205","DOI":"10.1109\/ACCESS.2018.2829199","article-title":"A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)","volume":"6","author":"Zuping","year":"2018","journal-title":"IEEE Access"},{"key":"ref_33","first-page":"17","article-title":"Article: Comparative Study of Text Summarization in Indian Languages","volume":"75","author":"Dhanya","year":"2013","journal-title":"Int. J. Comput. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.procs.2016.05.121","article-title":"A study on abstractive summarization techniques in indian languages","volume":"87","author":"Sunitha","year":"2016","journal-title":"Procedia Comput. Sci."},{"key":"ref_35","unstructured":"Thottungal, S. (2019, March 12). Indic Stemmer. Available online: https:\/\/silpa.readthedocs.io\/projects\/indicstemmer."},{"key":"ref_36","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_37","unstructured":"Arora, S., Liang, Y., and Ma, T. (2017, January 24\u201326). A simple but tough-to-beat baseline for sentence embeddings. Proceedings of the ICLR 2017, Toulon, France."},{"key":"ref_38","unstructured":"(2019, May 18). Natural Language Processing at KBCS, CDAC Mumbai. Available online: http:\/\/kbcs.in\/tools.php."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1007\/s12046-019-1082-4","article-title":"A novel approach for text summarization using optimal combination of sentence scoring methods","volume":"44","author":"Verma","year":"2019","journal-title":"S\u0101dhan\u0101"},{"key":"ref_40","unstructured":"Jones, K.S., and Galliers, J.R. (1995). Evaluating Natural Language Processing Systems: An Analysis and Review, Springer Science & Business Media."},{"key":"ref_41","unstructured":"Ibrahim Altmami, N., and El Bachir Menai, M. (2020). Automatic summarization of scientific articles: A survey. J. King Saud Univ. Comput. Inf. Sci."},{"key":"ref_42","unstructured":"Lin, C.Y. (2004, January 25\u201326). Rouge: A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kumar, K.V., and Yadav, D. (2015). An improvised extractive approach to hindi text summarization. Information Systems Design and Intelligent Applications, Springer.","DOI":"10.1007\/978-81-322-2250-7_28"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s12559-015-9359-3","article-title":"A novel hybrid text summarization system for Punjabi text","volume":"8","author":"Gupta","year":"2016","journal-title":"Cogn. Comput."},{"key":"ref_45","first-page":"1204","article-title":"Extractive Text Summarization of Marathi News Articles","volume":"5","author":"Rathod","year":"2018","journal-title":"Int. Res. J. Eng. Technol."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Banu, M., Karthika, C., Sudarmani, P., and Geetha, T. (2007, January 13\u201315). Tamil document summarization using semantic graph method. Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivakasi, India.","DOI":"10.1109\/ICCIMA.2007.247"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/12\/1\/41\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:12:30Z","timestamp":1760159550000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/12\/1\/41"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,18]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["info12010041"],"URL":"https:\/\/doi.org\/10.3390\/info12010041","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,18]]}}}