{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T16:08:20Z","timestamp":1765296500271,"version":"3.46.0"},"reference-count":34,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T00:00:00Z","timestamp":1765238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The assessment of Chinese text readability plays a significant role in Chinese language education. Due to the intrinsic differences between alphabetic languages and Chinese character representations, the readability assessment becomes more challenging in terms of the language\u2019s inherent complexity in vocabulary, syntax, and semantics. The article proposed the conceptual analogy between Chinese readability assessment and music\u2019s rhythm and tempo patterns, in which the syntactic structures of the Chinese sentences could be transformed into an image. The Chinese Knowledge and Information Processing Tagger (CkipTagger) tool developed by Sinica-Taiwan is utilized to decompose the Chinese text into a set of tokens. These tokens are then refined through a user-defined token pool to retain meaningful units. An image with part-of-speech (POS) information will be generated by using the token versus syntax alignment. A discrete cosine transform (DCT) is then applied to extract the temporal characteristics of the text. Moreover, the study integrated four categories: linguistic features\u2013type\u2013token ratio, average sentence length, total word, and difficulty level of vocabulary for the readability assessment. Finally, these features were fed into the Support Vector Machine (SVM) network for the classifications. Furthermore, a bidirectional long short-term memory (Bi-LSTM) network is adopted for quantitative comparisons. In simulation, a total of 774 Chinese texts fitted with Taiwan Benchmarks for the Chinese Language were selected and graded by Chinese language experts, consisting of equal amounts of basic, intermediate, and advanced levels. The finding indicated the proposed POS with the linguistic features work well in the SVM network, and the performance matches with the more complex architectures like the Bi-LSTM network in Chinese readability assessments.<\/jats:p>","DOI":"10.3390\/a18120777","type":"journal-article","created":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T15:50:02Z","timestamp":1765295402000},"page":"777","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Chinese Text Readability Assessment Based on the Integration of Visualized Part-of-Speech Information with Linguistic Features"],"prefix":"10.3390","volume":"18","author":[{"given":"Chi-Yi","family":"Hsieh","sequence":"first","affiliation":[{"name":"The Institute of Chinese Language Education, National Kaohsiung Normal University, Kaohsiung 80201, Taiwan"}]},{"given":"Jing-Yan","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National Chiayi University, Chiayi City 600325, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7006-8794","authenticated-orcid":false,"given":"Chi-Wen","family":"Hsieh","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National Chung Cheng University, Minhsiung 621301, Taiwan"},{"name":"Advanced Institute of Manufacturing with High-Tech Innovations, Ans. 621301 Innovation Building R209, 168 University Road, Ming-Hsiung Township, Chia-Yi 621301, Taiwan"}]},{"given":"Bo-Yuan","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National Chung Cheng University, Minhsiung 621301, Taiwan"}]},{"given":"Yi-Chi","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National Chung Cheng University, Minhsiung 621301, Taiwan"}]},{"given":"Yu-Xiang","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National Chung Cheng University, Minhsiung 621301, Taiwan"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1080\/00220671.1938.10880766","article-title":"Summary of reading investigations (July 1, 1936-June 30, 1937)","volume":"31","author":"Gray","year":"1938","journal-title":"J. Educ. Res."},{"key":"ref_2","first-page":"194","article-title":"Readability: A Factor in Student Research?","volume":"53","author":"Gray","year":"2012","journal-title":"Ref. Libr."},{"key":"ref_3","unstructured":"Sherman, L.A. (1893). Analytics of Literature: A Manual for the Objective Study of English Prose and Poetry, Ginn and Company."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hu, T., Chen, Z., Ge, J., Yang, Z., and Xu, J. (2023). A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data. Appl. Sci., 13.","DOI":"10.3390\/app13053334"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, H., Ye, Z., Zhao, H., and Yang, Y. (2023). Chinese Text De-Colloquialization Technique Based on Back-Translation Strategy and End-to-End Learning. Appl. Sci., 13.","DOI":"10.3390\/app131910818"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Guo, S., Huang, Y., Huang, B., Yang, L., and Zhou, C. (2023). CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement. Appl. Sci., 13.","DOI":"10.3390\/app13064056"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kostadimas, D., Kermanidis, K.L., and Andronikos, T. (2024). Exploring the Effectiveness of Shallow and L2 Learner-Suitable Textual Features for Supervised and Unsupervised Sentence-Based Readability Assessment. Appl. Sci., 14.","DOI":"10.3390\/app14177997"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, Y., Li, S., Deng, Y., Hao, S., and Wang, L. (2024). SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction. Electronics, 13.","DOI":"10.3390\/electronics13152949"},{"key":"ref_9","unstructured":"Ratajczak, M. (2020). The Effects of Individual Differences and Linguistic Features on Reading Comprehension of Health-Related Texts, Lancaster University."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1080\/01638530902959943","article-title":"Coh-Metrix: Capturing linguistic features of cohesion","volume":"47","author":"McNamara","year":"2010","journal-title":"Discourse Process."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1017\/S1351324919000093","article-title":"Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts","volume":"25","author":"Tseng","year":"2019","journal-title":"Nat. Lang. Eng."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1162\/COLI_a_00255","article-title":"All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch","volume":"42","author":"Hoste","year":"2016","journal-title":"Comput. Linguist."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"67610","DOI":"10.1109\/ACCESS.2021.3077073","article-title":"Combining readability formulas and machine learning for reader-oriented evaluation of online health resources","volume":"9","author":"Liu","year":"2021","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Maqsood, S., Shahid, A., Afzal, M.T., Roman, M., Khan, Z., Nawaz, Z., and Aziz, M.H. (2022). Assessing English language sentences readability using machine learning models. PeerJ Comput. Sci., 8.","DOI":"10.7717\/peerj-cs.818"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dascalu, M. (2014). Analyzing Discourse and Text Complexity for Learning and Collaborating, Springer Nature.","DOI":"10.1007\/978-3-319-03419-5"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1093\/comjnl\/bxaa113","article-title":"A machine learning-based model to evaluate readability and assess grade level for the web pages","volume":"65","author":"Pantula","year":"2022","journal-title":"Comput. J."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1162\/coli_a_00398","article-title":"Supervised and unsupervised neural approaches to text readability","volume":"47","author":"Martinc","year":"2021","journal-title":"Comput. Linguist."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"340","DOI":"10.3758\/s13428-014-0459-x","article-title":"Constructing and validating readability models: The method of integrating multilevel linguistic features with machine learning","volume":"47","author":"Sung","year":"2015","journal-title":"Behav. Res. Methods"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1007\/s40593-020-00201-7","article-title":"Applying natural language processing and hierarchical machine learning approaches to text difficulty classification","volume":"30","author":"Balyan","year":"2020","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Demner-Fushman, D., Elhadad, N., and Friedman, C. (2021). Natural language processing for health-related texts. Biomedical Informatics: Computer Applications in Health Care and Biomedicine, Springer International Publishing.","DOI":"10.1007\/978-3-030-58721-5_8"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Curiel, A., Guti\u00e9rrez-Soto, C., and Rojano-C\u00e1ceres, J.R. (2021). An online multi-source summarization algorithm for text readability in topic-based search. Comput. Speech Lang., 66.","DOI":"10.1016\/j.csl.2020.101143"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"88608","DOI":"10.1109\/ACCESS.2024.3418844","article-title":"Readability Grading Based on Multidimensional Linguistics Features for International Chinese Language Education","volume":"12","author":"Zhang","year":"2024","journal-title":"IEEE Access"},{"key":"ref_23","unstructured":"Zhu, S., Song, J., Peng, W., Guo, D., and Wu, G. (2020). Text readability assessment for Chinese second language teaching. Chinese Lexical Semantics: 20th Workshop, CLSW 2019, Beijing, China, 28\u201330 June 2019, Revised Selected Papers 20, Springer International Publishing."},{"key":"ref_24","first-page":"3159","article-title":"Towards a robust deep neural network against adversarial texts: A survey","volume":"35","author":"Wang","year":"2021","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_25","first-page":"184","article-title":"Text Readability Evaluation in Higher Education Using CNNs","volume":"1","author":"Zulqarnain","year":"2023","journal-title":"J. Ind. Intell."},{"key":"ref_26","unstructured":"(2025, August 15). Taiwan Benchmarks for the Chinese Language. Available online: https:\/\/bcoct.naer.edu.tw\/TBCL\/index.md."},{"key":"ref_27","unstructured":"(2025, August 15). CkipTagger. Available online: https:\/\/ckip.iis.sinica.edu.tw\/service\/ckiptagger\/."},{"key":"ref_28","unstructured":"Jina, A.I. (2025, August 15). jina-embeddings-v2-base-zh. Hugging Face. Available online: https:\/\/huggingface.co\/jinaai\/jina-embeddings-v2-base-zh."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hochreiter, S. (1997). Long Short-Term Memory, Neural Computation MIT-Press.","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.neucom.2018.04.045","article-title":"LSTM with sentence representations for document-level sentiment classification","volume":"308","author":"Rao","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_32","unstructured":"Melamud, O., Goldberger, J., and Dagan, I. (2025, November 15). context2vec: Learning Generic Context Embedding with Bidirectional LSTM. Available online: https:\/\/aclanthology.org\/K16-1006\/."},{"key":"ref_33","unstructured":"(2025, November 15). Pytorch. Available online: https:\/\/pytorch.org\/."},{"key":"ref_34","first-page":"75","article-title":"Investigating Chinese Text Readability: Linguistic Features, Modeling, and Validation","volume":"55","author":"Sung","year":"2012","journal-title":"Chin. J. Psychol."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/777\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T15:50:39Z","timestamp":1765295439000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/777"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,9]]},"references-count":34,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["a18120777"],"URL":"https:\/\/doi.org\/10.3390\/a18120777","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,9]]}}}