{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T19:48:05Z","timestamp":1769284085410,"version":"3.49.0"},"reference-count":43,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,6,12]],"date-time":"2025-06-12T00:00:00Z","timestamp":1749686400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006469","name":"Macao Science and Technology Development Fund","doi-asserted-by":"publisher","award":["0048\/2021\/APD"],"award-info":[{"award-number":["0048\/2021\/APD"]}],"id":[{"id":"10.13039\/501100006469","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006469","name":"Macao Science and Technology Development Fund","doi-asserted-by":"publisher","award":["RP\/FCA-10\/2022"],"award-info":[{"award-number":["RP\/FCA-10\/2022"]}],"id":[{"id":"10.13039\/501100006469","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Macao Polytechnic University","award":["0048\/2021\/APD"],"award-info":[{"award-number":["0048\/2021\/APD"]}]},{"name":"Macao Polytechnic University","award":["RP\/FCA-10\/2022"],"award-info":[{"award-number":["RP\/FCA-10\/2022"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Applied Sciences"],"abstract":"<jats:p>This study presents WordMap, an integrated text mining application developed to enhance the efficiency and usability of text analysis over a network. As unstructured text data continues to grow across domains, effective tools for segmentation and topic modeling have become increasingly essential for extracting insightful information. However, most existing solutions depend on multiple disconnected tools, and these often compromise workflow efficiency and user experience. Unlike traditional tools, WordMap combines corpus segmentation, topic modeling, and result visualization into a unified workflow for both Chinese and English languages, thereby reducing workflow fragmentation and lowering the user threshold. To assess usability and user acceptance, this research adopts the Technology Acceptance Model (TAM). WordMap employs PKUSEG and NLTK for bilingual corpus segmentation, utilizes BERTopic for dynamic topic modeling, and integrates interactive visualization to enable intuitive analysis. The PLS-SEM result shows that the perceived ease of use (PEOU) has a significant impact on both perceived usefulness (PU) and user attitude (ATT), while ATT strongly predicts behavioral intention (BI) (\u03b2 = 0.674, p &lt; 0.001). The results indicate that integrating core text mining processes into a user-centered design significantly boosts user satisfaction and adoption. By combining key processes and empirically validating user perceptions, the proposed framework facilitates the development of efficient and accessible text mining tools. It offers both theoretical and practical insights for future advancement and deployment in the field of text mining.<\/jats:p>","DOI":"10.3390\/app15126632","type":"journal-article","created":{"date-parts":[[2025,6,12]],"date-time":"2025-06-12T11:47:07Z","timestamp":1749728827000},"page":"6632","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["WordMap: Text Mining Application of Enhanced Corpus Segmentation and Semantic Topic Recognition"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-7965-1584","authenticated-orcid":false,"given":"Zhijian","family":"Wei","sequence":"first","affiliation":[{"name":"Faculty of Applied Sciences, Macao Polytechnic University, Macao, China"}]},{"given":"Huiwen","family":"Zou","sequence":"additional","affiliation":[{"name":"Faculty of Applied Sciences, Macao Polytechnic University, Macao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8820-5443","authenticated-orcid":false,"given":"Patrick Cheong-Iao","family":"Pang","sequence":"additional","affiliation":[{"name":"Faculty of Applied Sciences, Macao Polytechnic University, Macao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0551-5938","authenticated-orcid":false,"given":"Penny Wong-On","family":"Chao","sequence":"additional","affiliation":[{"name":"Faculty of Applied Sciences, Macao Polytechnic University, Macao, China"}]},{"given":"Benjamin K.","family":"Ng","sequence":"additional","affiliation":[{"name":"Faculty of Applied Sciences, Macao Polytechnic University, Macao, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1186\/s40691-021-00281-6","article-title":"Analyzing genderless fashion trends of consumers\u2019 perceptions on social media: Using unstructured big data analysis through Latent Dirichlet Allocation-based topic modeling","volume":"9","author":"Kim","year":"2022","journal-title":"Fash. Text."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"122386","DOI":"10.1016\/j.techfore.2023.122386","article-title":"The power of big data analytics over fake news: A scientometric review of Twitter as a predictive system in healthcare","volume":"190","year":"2023","journal-title":"Technol. Forecast. Soc. Change"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1016\/j.jbusres.2022.03.054","article-title":"The role of consumer data in marketing: A research agenda","volume":"146","author":"Lee","year":"2022","journal-title":"J. Bus. Res."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.procs.2023.12.074","article-title":"Web Scraping using Natural Language Processing: Exploiting Unstructured Text for Data Extraction and Analysis","volume":"230","author":"Pichiyan","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_5","first-page":"1","article-title":"An Empirical Comparison of Four Text Mining Methods","volume":"51","author":"Lee","year":"2010","journal-title":"J. Comput. Inf. Syst."},{"key":"ref_6","first-page":"329","article-title":"The application of text mining methods in innovation research: Current state, evolution patterns, and development priorities","volume":"50","author":"Antons","year":"2020","journal-title":"RD Manag."},{"key":"ref_7","unstructured":"Lee, G.G., Ginzburg, J., Gardent, C., and Stent, A. (2012, January 5\u20136). A Reranking Model for Discourse Segmentation using Subtree Features. Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Seoul, Republic of Korea. Available online: https:\/\/aclanthology.org\/W12-1623\/."},{"key":"ref_8","first-page":"933","article-title":"Latent Dirichlet Allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","unstructured":"Lee, D., and Seung, H.S. (2000). Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation. Available online: https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2000\/hash\/f9d1152547c0bde01830b7e8bd60024c-Abstract.html."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.ipm.2008.06.002","article-title":"Automatic generic document summarization based on non-negative matrix factorization","volume":"45","author":"Lee","year":"2009","journal-title":"Inf. Process. Manag."},{"key":"ref_11","first-page":"36645","article-title":"Investigating COVID-19 News Across Four Nations: A Topic Modeling and Sentiment Analysis Approach","volume":"9","author":"Ghasiya","year":"2021","journal-title":"IEEE Access Pract. Innov. Open Solut."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e66365","DOI":"10.2196\/66365","article-title":"Identifying Kidney Stone Risk Factors Through Patient Experiences with a Large Language Model: Text Analysis and Empirical Study","volume":"27","author":"Zhu","year":"2025","journal-title":"J. Med. Internet Res."},{"key":"ref_13","first-page":"24","article-title":"News recommendations based on collaborative topic modeling and collaborative filtering with generative adversarial networks","volume":"58","author":"Liu","year":"2024","journal-title":"Data Technol. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2733","DOI":"10.1109\/JBHI.2020.3001216","article-title":"Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach","volume":"24","author":"Jelodar","year":"2020","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_15","first-page":"295","article-title":"Efficient daily news platform generation using natural language processing","volume":"11","author":"Devadoss","year":"2019","journal-title":"Int. J. Inf. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"139265","DOI":"10.1016\/j.jclepro.2023.139265","article-title":"Toward an ecological civilization: Exploring changes in China\u2019s land use policy over the past 35 years using text mining","volume":"427","author":"Song","year":"2023","journal-title":"J. Clean. Prod."},{"key":"ref_17","unstructured":"Lu, H., and Cai, J. (2024). Experimental Comparison of Three Topic Modeling Methods with LDA, Top2Vec and BERTopic. Artificial Intelligence and Robotics. ISAIR 2023, Springer. Communications in Computer and Information Science;."},{"key":"ref_18","unstructured":"Mitkov, R., Ezzini, S., Ranasinghe, T., Ezeani, I., Khallaf, N., Acarturk, C., Bradbury, M., El-Haj, M., and Rayson, P. (2024, January 29\u201330). U-BERTopic: An Urgency-Aware BERT-Topic Modeling Approach for Detecting CyberSecurity Issues via Social Media. Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, UK. Available online: https:\/\/aclanthology.org\/2024.nlpaics-1.22\/."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/s13042-023-01885-8","article-title":"Semantic rule-based information extraction for meteorological reports","volume":"15","author":"Cui","year":"2024","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Palmer, D.D. (1997, January 7\u201312). A Trainable Rule-Based Algorithm for Word Segmentation. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain.","DOI":"10.3115\/976909.979658"},{"key":"ref_21","first-page":"159","article-title":"Twitter Emotion Analysis in Earthquake Situations","volume":"4","author":"Vo","year":"2013","journal-title":"Int. J. Comput. Linguist. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1023\/A:1007506220214","article-title":"Statistical Models for Text Segmentation","volume":"34","author":"Beeferman","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Khare, R., and An, Y. (2009, January 2\u20136). An empirical study on using hidden markov model for search interface segmentation. Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China.","DOI":"10.1145\/1645953.1645959"},{"key":"ref_24","first-page":"1264","article-title":"A Chinese word segmentation algorithm based on maximum entropy","volume":"3","author":"Zhang","year":"2010","journal-title":"Int. Conf. Mach. Learn. Cybern."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Vemulapalli, R., Tuzel, O., Liu, M.-Y., and Chellapa, R. (2025, January 09). Gaussian Conditional Random Field Network for Semantic Segmentation. Available online: https:\/\/openaccess.thecvf.com\/content_cvpr_2016\/html\/Vemulapalli_Gaussian_Conditional_Random_CVPR_2016_paper.html.","DOI":"10.1109\/CVPR.2016.351"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Xue, N., and Shen, L. (2003, January 11\u201312). Chinese Word Segmentation as LMR Tagging. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan.","DOI":"10.3115\/1119250.1119278"},{"key":"ref_27","unstructured":"Zhao, Y., and Fu, G. (2013, January 22\u201325). A MEMs-based Labeling Approach to Punctuation Correction in Chinese Opinionated Text. Proceedings of the International Conference on Artificial Intelligence (ICAI), Las Vegas, NV, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.neucom.2017.12.058","article-title":"CRF based text detection for natural scene images using convolutional neural network and context information","volume":"295","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Koshorek, O., Cohen, A., Mor, N., Rotman, M., and Berant, J. (2018). Text Segmentation as a Supervised Learning Task. arXiv.","DOI":"10.18653\/v1\/N18-2075"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Iosifov, I., Iosifova, O., and Sokolov, V. (2020, January 6\u20139). Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches. Proceedings of the 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T), Kharkiv, Ukraine.","DOI":"10.1109\/PICST51311.2020.9468084"},{"key":"ref_31","unstructured":"Luo, R., Xu, J., Zhang, Y., Zhang, Z., Ren, X., and Sun, X. (2022). PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Santosh, K.C., and Hegadi, R.S. (2019). Review on Natural Language Processing Trends and Techniques Using NLTK. Recent Trends in Image Processing and Pattern Recognition, Springer.","DOI":"10.1007\/978-981-13-9187-3_67"},{"key":"ref_33","unstructured":"Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv."},{"key":"ref_34","unstructured":"Albanese, N.C. (2025, January 15). Topic Modeling with LSA, pLSA, LDA, NMF, BERTopic, Top2Vec: A Comparison; Towards Data Science: 2022. Available online: https:\/\/towardsdatascience.com\/topic-modeling-with-lsa-plsa-lda-nmf-bertopic-top2vec-a-comparison-5e6ce4b1e4a5\/."},{"key":"ref_35","unstructured":"Thampi, S.M., Chaudhary, V., Pathan, A.-S.K., Li, K.C., and Krishnaswamy, D. (2025). Title-Based Topic Modeling on E-learning Web Content Titles Using BERTopic Model. Fifth International Conference on Computing and Network Communications, Springer Nature."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Rachel J., J.L., Bhuvaneswari, A., and Kumudha, M. (2024, January 25\u201327). Topic Modeling Based Clustering of Disaster Tweets Using BERTopic. Proceedings of the 2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon), Pune, India.","DOI":"10.1109\/MITADTSoCiCon60330.2024.10575555"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Brambilla, M., Chbeir, R., Frasincar, F., and Manolescu, I. (2021). Visualizing Web Users\u2019 Attention to Text with Selection Heatmaps. Web Engineering, Springer International Publishing.","DOI":"10.1007\/978-3-030-74296-6_49"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.im.2013.12.003","article-title":"The adoption of software measures: A technology acceptance model (TAM) perspective","volume":"51","author":"Wallace","year":"2014","journal-title":"Inf. Manag."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1108\/EBR-11-2018-0203","article-title":"When to use and how to report the results of PLS-SEM","volume":"31","author":"Hair","year":"2019","journal-title":"Eur. Bus. Rev."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"12706","DOI":"10.5465\/AMBPP.2017.12706abstract","article-title":"Current Approaches for Assessing Convergent and Discriminant Validity with SEM: Issues and Solutions","volume":"2017","author":"Cheung","year":"2017","journal-title":"Acad. Manag. Proc."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"425","DOI":"10.2307\/30036540","article-title":"User Acceptance of Information Technology: Toward a Unified View","volume":"27","author":"Venkatesh","year":"2003","journal-title":"MIS Q."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"49007","DOI":"10.1109\/ACCESS.2025.3549733","article-title":"Performance Evaluation and Application Potential of Small Large Language Models in Complex Sentiment Analysis Tasks","volume":"13","author":"Guo","year":"2025","journal-title":"IEEE Access"},{"key":"ref_43","first-page":"540","article-title":"Roles of Information Propagation of Chinese Microblogging Users in Epidemics: A Crisis Management Perspective","volume":"31","author":"Si","year":"2021","journal-title":"Internet Res."}],"container-title":["Applied Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2076-3417\/15\/12\/6632\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:50:56Z","timestamp":1760032256000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2076-3417\/15\/12\/6632"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,12]]},"references-count":43,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["app15126632"],"URL":"https:\/\/doi.org\/10.3390\/app15126632","relation":{},"ISSN":["2076-3417"],"issn-type":[{"value":"2076-3417","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,12]]}}}