{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T17:51:39Z","timestamp":1774374699453,"version":"3.50.1"},"reference-count":32,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,1,23]],"date-time":"2020-01-23T00:00:00Z","timestamp":1579737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772426"],"award-info":[{"award-number":["61772426"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Arabic is one of the most semantically and syntactically complex languages in the world. A key challenging issue in text mining is text summarization, so we propose an unsupervised score-based method which combines the vector space model, continuous bag of words (CBOW), clustering, and a statistically-based method. The problems with multidocument text summarization are the noisy data, redundancy, diminished readability, and sentence incoherency. In this study, we adopt a preprocessing strategy to solve the noise problem and use the word2vec model for two purposes, first, to map the words to fixed-length vectors and, second, to obtain the semantic relationship between each vector based on the dimensions. Similarly, we use a k-means algorithm for two purposes: (1) Selecting the distinctive documents and tokenizing these documents to sentences, and (2) using another iteration of the k-means algorithm to select the key sentences based on the similarity metric to overcome the redundancy problem and generate the initial summary. Lastly, we use weighted principal component analysis (W-PCA) to map the sentences\u2019 encoded weights based on a list of features. This selects the highest set of weights, which relates to important sentences for solving incoherency and readability problems. We adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as an evaluation measure to examine our proposed technique and compare it with state-of-the-art methods. Finally, an experiment on the Essex Arabic Summaries Corpus (EASC) using the ROUGE-1 and ROUGE-2 metrics showed promising results in comparison with existing methods.<\/jats:p>","DOI":"10.3390\/info11020059","type":"journal-article","created":{"date-parts":[[2020,1,23]],"date-time":"2020-01-23T10:36:02Z","timestamp":1579775762000},"page":"59","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":38,"title":["Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2517-1522","authenticated-orcid":false,"given":"Samer","family":"Abdulateef","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China"}]},{"given":"Naseer Ahmed","family":"Khan","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China"}]},{"given":"Bolin","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China"}]},{"given":"Xuequn","family":"Shang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,1,23]]},"reference":[{"key":"ref_1","first-page":"e12340","article-title":"COSUM: Text summarization based on clustering and optimization","volume":"36","author":"Aliguliyev","year":"2019","journal-title":"Wiley Online Libr."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.knosys.2019.03.002","article-title":"Comparison of automatic methods for reducing the Pareto front to a single solution applied to multi-document text summarization","volume":"174","year":"2019","journal-title":"Knowl. -Based Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.eswa.2018.11.022","article-title":"MCRMR: Maximum coverage and relevancy with minimal redundancy based multi-document summarization","volume":"120","author":"Verma","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Patel, D.B., Shah, S., and Chhinkaniwala, H.R. (2019). Fuzzy logic based multi Document Summarization with improved sentence scoring and redundancy removal technique. Expert Syst. Appl.","DOI":"10.1016\/j.eswa.2019.05.045"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Mallick, C., Das, A.K., Dutta, M., Das, A.K., and Sarkar, A. (2019). Graph-Based Text Summarization Using Modified TextRank. Soft Computing in Data Analytics, Springer.","DOI":"10.1007\/978-981-13-0514-6_14"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1007\/s10462-017-9566-2","article-title":"Text summarization from legal documents: A survey","volume":"51","author":"Kanapala","year":"2019","journal-title":"Artif. Intell. Rev."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Belkebir, R., and Guessoum, A. (2015). A supervised approach to arabic text summarization using adaboost. New Contributions in Information Systems and Technologies, Springer.","DOI":"10.1007\/978-3-319-16486-1_23"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Amato, F., Marrone, S., Moscato, V., Piantadosi, G., Picariello, A., and Sansone, C. (2019). HOLMeS: eHealth in the Big Data and Deep Learning Era. MDPI Inf., 10.","DOI":"10.3390\/info10020034"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1016\/j.csl.2016.06.005","article-title":"Language. Modeling content and structure for abstractive review summarization","volume":"53","author":"Gerani","year":"2019","journal-title":"Comput. Speech Lang."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Abualigah, L., Bashabsheh, M.Q., Alabool, H., and Shehab, M. (2020). Text Summarization: A Brief Review. Recent Advances in NLP: The Case of Arabic Language, Springer.","DOI":"10.1007\/978-3-030-34614-0_1"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Suleiman, D., Awajan, A.A., and Al Etaiwi, W. (2019, January 9\u201311). Arabic Text Keywords Extraction using Word2vec. Proceedings of the 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), Amman, Jordan.","DOI":"10.1109\/ICTCS.2019.8923034"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1016\/j.future.2018.11.035","article-title":"Extreme events management using multimedia social networks","volume":"94","author":"Amato","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Al-Abdallah, R.Z., and Al-Taani, A.T. (2019, January 4\u20136). Arabic Text Summarization using Firefly Algorithm. Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates.","DOI":"10.1109\/AICAI.2019.8701245"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s10115-009-0194-2","article-title":"A document-sensitive graph model for multi-document summarization","volume":"22","author":"Wei","year":"2010","journal-title":"Knowl. Inf. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wan, X., and Yang, J. (2008, January 20\u201324). Multi-document summarization using cluster-based link analysis. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore.","DOI":"10.1145\/1390334.1390386"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1016\/j.future.2018.04.006","article-title":"Multimedia story creation on social networks","volume":"86","author":"Amato","year":"2018","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1007\/s12559-018-9547-z","article-title":"A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms","volume":"10","author":"Bataineh","year":"2018","journal-title":"Cogn. Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.procs.2017.10.091","article-title":"Arabic single-document text summarization using particle swarm optimization algorithm","volume":"117","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lagrini, S., Redjimi, M., and Azizi, N. (2017). Automatic Arabic Text Summarization Approaches. Int. J. Comput. Appl., 164.","DOI":"10.5120\/ijca2017913628"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Bialy, A.A., Gaheen, M.A., ElEraky, R., ElGamal, A., and Ewees, A.A. (2020). Single Arabic Document Summarization Using Natural Language Processing Technique. Recent Advances in NLP: The Case of Arabic Language, Springer.","DOI":"10.1007\/978-3-030-34614-0_2"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.procs.2017.10.088","article-title":"Automatic Arabic summarization: A survey of methodologies and systems","volume":"117","author":"Wang","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Badry, R.M., and Moawad, I.F. (2019, January 28\u201330). A Semantic Text Summarization Model for Arabic Topic-Oriented. Proceedings of the International Conference on Advanced Machine Learning Technologies and Applications, Cairo, Egypt.","DOI":"10.1007\/978-3-030-14118-9_52"},{"key":"ref_23","unstructured":"El-Haj, M., Kruschwitz, U., and Fox, C. (2010). Using Mechanical Turk to Create a Corpus of Arabic Summaries, University of Essex."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.eswa.2019.01.037","article-title":"Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning","volume":"123","author":"Alami","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Blagec, K., Xu, H., Agibetov, A., and Samwald, M. (2019). Neural sentence embedding models for semantic similarity estimation in the biomedical domain. BMC Bioinform., 20.","DOI":"10.1186\/s12859-019-2789-2"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Elbarougy, R., Behery, G., and El Khatib, A. (2019). Extractive Arabic Text Summarization Using Modified PageRank Algorithm. Int. Conf. Adv. Mach. Learn. Technol. Appl.","DOI":"10.1016\/j.eij.2019.11.001"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3797","DOI":"10.1007\/s11042-018-6083-5","article-title":"Feature selection for text classification: A review","volume":"78","author":"Deng","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1016\/j.knosys.2018.09.008","article-title":"A survey of multiple types of text summarization with their satellite contents based on swarm intelligence optimization algorithms","volume":"163","author":"Mosa","year":"2019","journal-title":"Knowl. -Based Syst."},{"key":"ref_29","unstructured":"Adhvaryu, N., and Balani, P. (2015, January 8). Survey: Part-Of-Speech Tagging in NLP. Proceedings of the International Journal of Research in Advent Technology (E-ISSN: 2321-9637) Special Issue 1st International Conference on Advent Trends in Engineering, Science and Technology \u201cICATEST 2015\u201d, Amravati, Maharashtra, India."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Abuobieda, A., Salim, N., Albaham, A.T., Osman, A.H., and Kumar, Y.J. (2012, January 13\u201315). Text summarization features selection method using pseudo genetic-based model. Proceedings of the 2012 International Conference on Information Retrieval & Knowledge Management, Kuala Lumpur, Malaysia.","DOI":"10.1109\/InfRKM.2012.6204980"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/s10462-015-9442-x","article-title":"Automatic Arabic text summarization: A survey","volume":"45","author":"Menai","year":"2016","journal-title":"Artif. Intell. Rev."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/j.neucom.2019.03.060","article-title":"Multivariate time series clustering based on common principal component analysis","volume":"349","author":"Li","year":"2019","journal-title":"Neurocomputing"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/2\/59\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:30:22Z","timestamp":1760362222000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/2\/59"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,23]]},"references-count":32,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,2]]}},"alternative-id":["info11020059"],"URL":"https:\/\/doi.org\/10.3390\/info11020059","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,23]]}}}