{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T22:08:58Z","timestamp":1740175738782,"version":"3.37.3"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,12,10]],"date-time":"2022-12-10T00:00:00Z","timestamp":1670630400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,10]],"date-time":"2022-12-10T00:00:00Z","timestamp":1670630400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"2022 Guangdong-Hong Kong-Macao Greater Bay Area Exchange Programs of SCNU"},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Grant 62006080"],"award-info":[{"award-number":["Grant 62006080"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Natural Science Funds for Distinguished Young Scholars","award":["Grant 2022B1515020049"],"award-info":[{"award-number":["Grant 2022B1515020049"]}]},{"name":"Guangdong Regional Joint Fund for Basic and Applied Research","award":["Grant 2021B1515120078"],"award-info":[{"award-number":["Grant 2021B1515120078"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Document clustering has long been an important research direction in intelligent system. When being applied to process Chinese documents, new challenges were posted since it is infeasible to directly split the Chinese documents using the whitespace character. Moreover, many Chinese document clustering algorithms require prior knowledge of the cluster number, which is impractical to know in real-world applications. Considering these problems, we propose a general Chinese document clustering framework, where the main clustering task is fulfilled with an adaptive encoding-based evolutionary approach. Specifically, the adaptive encoding scheme is proposed to automatically learn the cluster number, and novel crossover and mutation operators are designed to fit this scheme. In addition, a single step of <jats:italic>K<\/jats:italic>-means is incorporated to conduct a joint global and local search, enhancing the overall exploitation ability. The experiments on benchmark datasets demonstrate the superiority of the proposed method in both the efficiency and the clustering precision.<\/jats:p>","DOI":"10.1007\/s40747-022-00934-z","type":"journal-article","created":{"date-parts":[[2022,12,10]],"date-time":"2022-12-10T09:02:38Z","timestamp":1670662958000},"page":"3385-3398","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Adaptive encoding-based evolutionary approach for Chinese document clustering"],"prefix":"10.1007","volume":"9","author":[{"given":"Jun-Xian","family":"Chen","sequence":"first","affiliation":[]},{"given":"Yue-Jiao","family":"Gong","sequence":"additional","affiliation":[]},{"given":"Wei-Neng","family":"Chen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8827-3991","authenticated-orcid":false,"given":"Xiaolin","family":"Xiao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,12,10]]},"reference":[{"issue":"1","key":"934_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s40747-021-00343-8","volume":"8","author":"Z Huang","year":"2022","unstructured":"Huang Z, Xie Z (2022) A patent keywords extraction method using textrank model with prior public knowledge. Complex Intell Syst 8(1):1\u201312","journal-title":"Complex Intell Syst"},{"issue":"1","key":"934_CR2","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s40747-019-00123-5","volume":"6","author":"J Chen","year":"2020","unstructured":"Chen J, Zhao C, Chen L et al (2020) Collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering. Complex Intell Syst 6(1):147\u2013156","journal-title":"Complex Intell Syst"},{"issue":"1","key":"934_CR3","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1007\/s40747-020-00212-w","volume":"7","author":"Q Zhang","year":"2021","unstructured":"Zhang Q, Lu J, Jin Y (2021) Artificial intelligence in recommender systems. Complex Intell Syst 7(1):439\u2013457","journal-title":"Complex Intell Syst"},{"issue":"5","key":"934_CR4","doi-asserted-by":"publisher","first-page":"2765","DOI":"10.1007\/s40747-021-00450-6","volume":"7","author":"H Cong","year":"2021","unstructured":"Cong H, Chen W-N, Yu W-J (2021) A two-stage information retrieval system based on interactive multimodal genetic algorithm for query weight optimization. Complex Intell Syst 7(5):2765\u20132781","journal-title":"Complex Intell Syst"},{"issue":"6","key":"934_CR5","doi-asserted-by":"publisher","first-page":"2977","DOI":"10.1007\/s40747-021-00482-y","volume":"7","author":"F Yin","year":"2021","unstructured":"Yin F, Wang Y, Liu J, Tosato M (2021) Modeling multi-prototype Chinese word representation learning for word similarity. Complex Intell Syst 7(6):2977\u20132990","journal-title":"Complex Intell Syst"},{"key":"934_CR6","volume-title":"Algorithms for clustering data","author":"AK Jain","year":"1988","unstructured":"Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc, USA"},{"key":"934_CR7","doi-asserted-by":"crossref","unstructured":"Muflikhah L, Baharudin B (2009) Document clustering using concept space and cosine similarity measurement. In: 2009 international conference on computer technology and development, vol\u00a01, pp 58\u201362","DOI":"10.1109\/ICCTD.2009.206"},{"key":"934_CR8","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1016\/0306-4573(86)90097-X","volume":"22","author":"E Voorhees","year":"1986","unstructured":"Voorhees E (1986) Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf Process Manag 22:465\u201376","journal-title":"Inf Process Manag"},{"key":"934_CR9","doi-asserted-by":"crossref","unstructured":"Gil-Garcia R, Pons-Porrata A (2010) Dynamic hierarchical algorithms for document clustering. Pattern Recognit Lett 31(6):469\u2013477 (cIARP 2008: robust and efficient analysis of signals and images)","DOI":"10.1016\/j.patrec.2009.11.011"},{"key":"934_CR10","doi-asserted-by":"crossref","unstructured":"Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 269\u2013274","DOI":"10.1145\/502512.502550"},{"key":"934_CR11","doi-asserted-by":"crossref","unstructured":"Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 89\u201398","DOI":"10.1145\/956750.956764"},{"key":"934_CR12","unstructured":"Elavarasi SA, Akilandeswari J, Sathiyabhama B (2022) A survey on partition clustering algorithms. Int J Enterprise Comput Bus Syst 1(1)"},{"key":"934_CR13","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1016\/j.phpro.2012.05.066","volume":"33","author":"M Yao","year":"2012","unstructured":"Yao M, Pi D, Cong X (2012) Chinese text clustering algorithm based k-means. Phys Procedia 33:301\u2013307","journal-title":"Phys Procedia"},{"key":"934_CR14","doi-asserted-by":"crossref","unstructured":"Xiong C, Hua Z, Lv K, Li X (2016) An improved k-means text clustering algorithm by optimizing initial cluster centers. In: International conference on cloud computing and big data, pp 265\u2013268","DOI":"10.1109\/CCBD.2016.059"},{"issue":"6","key":"934_CR15","doi-asserted-by":"publisher","first-page":"3211","DOI":"10.1007\/s40747-021-00512-9","volume":"7","author":"V Mehta","year":"2021","unstructured":"Mehta V, Bawa S, Singh J (2021) Weclustering: word embeddings based text clustering technique for large datasets. Complex Intell Syst 7(6):3211\u20133224","journal-title":"Complex Intell Syst"},{"issue":"1","key":"934_CR16","first-page":"100","volume":"28","author":"JA Hartigan","year":"1979","unstructured":"Hartigan JA, Wong MA (1979) Algorithm as 136: a $$k$$-means clustering algorithm, Journal of the Royal Statistical Society. Ser C (Appl Stat) 28(1):100\u2013108","journal-title":"Ser C (Appl Stat)"},{"key":"934_CR17","doi-asserted-by":"crossref","unstructured":"Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings IEEE swarm intelligence symposium, pp 185\u2013191","DOI":"10.1109\/SIS.2005.1501621"},{"issue":"5","key":"934_CR18","doi-asserted-by":"publisher","first-page":"2517","DOI":"10.1016\/j.eswa.2014.11.003","volume":"42","author":"W Song","year":"2015","unstructured":"Song W, Qiao Y, Park SC, Qian X (2015) A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl 42(5):2517\u20132524","journal-title":"Expert Syst Appl"},{"key":"934_CR19","doi-asserted-by":"crossref","unstructured":"Zhang Z, Cheng H, Zhang S, Chen W, Fang Q (2008) Clustering aggregation based on genetic algorithm for documents clustering. In: IEEE congress on evolutionary computation, pp 3156\u20133161","DOI":"10.1109\/CEC.2008.4631225"},{"key":"934_CR20","doi-asserted-by":"crossref","unstructured":"Tseng C-M, Tsai K-H, Hsu C-C, Chang H-C (2005) On the Chinese document clustering based on dynamical term clustering. In: Information retrieval technology. Springer, Berlin, pp 534\u2013539","DOI":"10.1007\/11562382_46"},{"issue":"2","key":"934_CR21","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1109\/TCSS.2019.2897641","volume":"6","author":"X Geng","year":"2019","unstructured":"Geng X, Zhang Y, Jiao Y, Mei Y (2019) A novel hybrid clustering algorithm for topic detection on Chinese microblogging. IEEE Trans Comput Soc Syst 6(2):289\u2013300","journal-title":"IEEE Trans Comput Soc Syst"},{"issue":"10","key":"934_CR22","doi-asserted-by":"publisher","first-page":"1279","DOI":"10.1109\/TKDE.2004.58","volume":"16","author":"KM Hammouda","year":"2004","unstructured":"Hammouda KM, Kamel MS (2004) Efficient phrase-based document indexing for web document clustering. IEEE Trans Knowl Data Eng 16(10):1279\u20131296","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"2","key":"934_CR23","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1109\/TNNLS.2016.2626311","volume":"29","author":"X Pei","year":"2016","unstructured":"Pei X, Chen C, Gong W (2016) Concept factorization with adaptive neighbors for document clustering. IEEE Trans Neural Netw Learn Syst 29(2):343\u2013352","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"10","key":"934_CR24","doi-asserted-by":"publisher","first-page":"1929","DOI":"10.1109\/TKDE.2017.2781721","volume":"30","author":"AJ Brockmeier","year":"2018","unstructured":"Brockmeier AJ, Mu T, Ananiadou S, Goulermas JY (2018) Self-tuned descriptive document clustering using a predictive network. IEEE Trans Knowl Data Eng 30(10):1929\u20131942","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"2","key":"934_CR25","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1145\/261342.571216","volume":"28","author":"DS Hochba","year":"1997","unstructured":"Hochba DS (1997) Approximation algorithms for np-hard problems. ACM Sigact News 28(2):40\u201352","journal-title":"ACM Sigact News"},{"key":"934_CR26","doi-asserted-by":"crossref","unstructured":"Meena YK, Shashank V, Singh P (2012) Article: text documents clustering using genetic algorithm and discrete differential evolution. Int J Comput Appl 43(1):16\u201319","DOI":"10.5120\/6067-8221"},{"key":"934_CR27","doi-asserted-by":"crossref","unstructured":"Kamel N, Ouchen I, Baali K (2014) A sampling-pso-k-means algorithm for document clustering. In: Genetic and evolutionary computing. Springer International Publishing, pp 45\u201354","DOI":"10.1007\/978-3-319-01796-9_5"},{"key":"934_CR28","doi-asserted-by":"crossref","unstructured":"Lee JS, Park SC (2012) Document clustering using multi-objective genetic algorithms on matlab distributed computing. In: International conference on information science and applications, pp 1\u20136","DOI":"10.1109\/ICISA.2012.6220980"},{"key":"934_CR29","doi-asserted-by":"crossref","unstructured":"Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-based text clustering technique using k-mean algorithm. In: 7th international conference on computer science and information technology, pp 1\u20136","DOI":"10.1109\/CSIT.2016.7549464"},{"key":"934_CR30","doi-asserted-by":"crossref","unstructured":"Cobos C, Montealegre C, Mejia M, Mendoza M, Leon E (2010) Web document clustering based on a new niching memetic algorithm, term-document matrix and Bayesian information criterion. In: IEEE congress on evolutionary computation, pp 1\u20138","DOI":"10.1109\/CEC.2010.5586016"},{"issue":"2","key":"934_CR31","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1177\/0165551516638784","volume":"43","author":"A Onan","year":"2017","unstructured":"Onan A, Bulut H, Korukoglu S (2017) An improved ant algorithm with lda-based representation for text document clustering. J Inf Sci 43(2):275\u2013292","journal-title":"J Inf Sci"},{"key":"934_CR32","doi-asserted-by":"crossref","unstructured":"Akter R, Chung Y (2017) An improved evolutionary approach for document clustering. In: Proceedings of the international conference on research in adaptive and convergent systems, ACM, pp 40\u201343","DOI":"10.1145\/3129676.3129733"},{"key":"934_CR33","doi-asserted-by":"crossref","unstructured":"Wahid A, Gao X, Andreae P(2014) Multi-view clustering of web documents using multi-objective genetic algorithm. In: IEEE congress on evolutionary computation, pp 2625\u20132632","DOI":"10.1109\/CEC.2014.6900586"},{"key":"934_CR34","unstructured":"(Sep. 29, 2012). Jieba Software. [Online]. https:\/\/github.com\/fxsjy\/jieba"},{"key":"934_CR35","unstructured":"Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, Inc"},{"key":"934_CR36","volume-title":"SpringerLink, principal component analysis","author":"IT Jolliffe","year":"2002","unstructured":"Jolliffe IT (2002) SpringerLink, principal component analysis, 2nd edn. Springer, Secaucus","edition":"2"},{"key":"934_CR37","doi-asserted-by":"crossref","unstructured":"Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034\u20131040","DOI":"10.1109\/TPAMI.2003.1217609"},{"key":"934_CR38","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1109\/TASLP.2020.2967539","volume":"28","author":"M Zhang","year":"2020","unstructured":"Zhang M, Ge Z, Liu T, Wu X, Qu T (2020) Modeling of individual hrtfs based on spatial principal component analysis. IEEE\/ACM Trans Audio Speech Lang Process 28:785\u2013797","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"issue":"3","key":"934_CR39","doi-asserted-by":"publisher","first-page":"433","DOI":"10.1109\/3477.764879","volume":"29","author":"K Krishna","year":"1999","unstructured":"Krishna K, Murty MN (1999) Genetic $$k$$-means algorithm. IEEE Trans Syst Man Cybern Part B (Cybern) 29(3):433\u2013439","journal-title":"IEEE Trans Syst Man Cybern Part B (Cybern)"},{"key":"934_CR40","doi-asserted-by":"crossref","unstructured":"Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A Syst Hum 38(1):218\u2013237","DOI":"10.1109\/TSMCA.2007.909595"},{"key":"934_CR41","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106583","volume":"95","author":"S Liang","year":"2020","unstructured":"Liang S, Han D, Yang Y (2020) Cluster validity index for irregular clustering results. Appl Soft Comput 95:106583","journal-title":"Appl Soft Comput"},{"key":"934_CR42","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","volume":"2","author":"DL Davies","year":"1979","unstructured":"Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224\u2013227","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"1","key":"934_CR43","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1109\/5326.923275","volume":"31","author":"S Bandyopadhyay","year":"2001","unstructured":"Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(1):120\u2013125","journal-title":"IEEE Trans Syst Man Cybern Part C (Appl Rev)"},{"issue":"2","key":"934_CR44","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1109\/TETCI.2018.2863728","volume":"4","author":"V Kuppili","year":"2020","unstructured":"Kuppili V, Biswas M, Edla DR, Prasad KJR, Suri JS (2020) A mechanics-based similarity measure for text classification in machine learning paradigm. IEEE Trans Emerg Top Comput Intell 4(2):180\u2013200","journal-title":"IEEE Trans Emerg Top Comput Intell"},{"key":"934_CR45","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1016\/j.eswa.2019.05.030","volume":"134","author":"R Janani","year":"2019","unstructured":"Janani R, Vijayarani S (2019) Text document clustering using spectral clustering algorithm with particle swarm optimization. Expert Syst Appl 134:192\u2013200","journal-title":"Expert Syst Appl"},{"issue":"2","key":"934_CR46","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1002\/asi.5090060209","volume":"6","author":"A Kent","year":"1955","unstructured":"Kent A, Berry MM, Luehrs FU Jr, Perry JW (1955) Machine literature searching viii. operational criteria for designing information retrieval systems. Am Document 6(2):93\u2013101","journal-title":"Am Document"},{"key":"934_CR47","doi-asserted-by":"crossref","unstructured":"Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345\u2013359","DOI":"10.1007\/978-3-540-31865-1_25"},{"key":"934_CR48","doi-asserted-by":"crossref","unstructured":"Meila M (2005) Comparing clusterings: an axiomatic view. In: Proceedings of the 22nd international conference on machine learning, pp 577\u2013584","DOI":"10.1145\/1102351.1102424"},{"key":"934_CR49","doi-asserted-by":"crossref","unstructured":"Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50\u201360","DOI":"10.1214\/aoms\/1177730491"},{"issue":"260","key":"934_CR50","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1080\/01621459.1952.10483441","volume":"47","author":"WH Kruskal","year":"1952","unstructured":"Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583\u2013621","journal-title":"J Am Stat Assoc"},{"key":"934_CR51","doi-asserted-by":"crossref","unstructured":"Storn R, Price K (1996) Minimizing the real functions of the ICEC\u201996 contest by differential evolution. In: 1996 international conference on evolutionary computation, IEEE, pp 842\u2013844","DOI":"10.1109\/ICEC.1996.542711"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00934-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00934-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00934-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T17:12:28Z","timestamp":1686330748000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00934-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,10]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["934"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00934-z","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2022,12,10]]},"assertion":[{"value":"7 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}