{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T20:03:15Z","timestamp":1767211395350,"version":"build-2065373602"},"reference-count":58,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T00:00:00Z","timestamp":1753401600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012190","name":"Development of a methodology for instrumental base formation for analysis and modeling of the spatial socio-economic development of systems based on internal reserves in the context of digitalization","doi-asserted-by":"publisher","award":["FSEG-2023-0008"],"award-info":[{"award-number":["FSEG-2023-0008"]}],"id":[{"id":"10.13039\/501100012190","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>With the exponential growth of textual data, traditional topic modeling methods based on static analysis demonstrate limited effectiveness in tracking the dynamics of thematic content. This research aims to develop a method for quantifying the dynamics of topics within text corpora using a thematic signal (TS) function that accounts for temporal changes and semantic relationships. The proposed method combines associative tokens with original lexical units to reduce thematic entropy and information noise. Approaches employed include topic modeling (LDA), vector representations of texts (TF-IDF, Word2Vec), and time series analysis. The method was tested on a corpus of news texts (5000 documents). Results demonstrated robust identification of semantically meaningful thematic clusters. An inverse relationship was observed between the level of thematic significance and semantic diversity, confirming a reduction in entropy using the proposed method. This approach allows for quantifying topic dynamics, filtering noise, and determining the optimal number of clusters. Future applications include analyzing multilingual data and integration with neural network models. The method shows potential for monitoring information flows and predicting thematic trends.<\/jats:p>","DOI":"10.3390\/bdcc9080197","type":"journal-article","created":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T11:38:41Z","timestamp":1753443521000},"page":"197","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Integration of Associative Tokens into Thematic Hyperspace: A Method for Determining Semantically Significant Clusters in Dynamic Text Streams"],"prefix":"10.3390","volume":"9","author":[{"given":"Dmitriy","family":"Rodionov","sequence":"first","affiliation":[{"name":"Higher School of Engineering and Economics, Institute of Industrial Management, Economics and Trade, Peter the Great St. Petersburg Polytechnic University, 195251 Saint-Petersburg, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5153-7727","authenticated-orcid":false,"given":"Boris","family":"Lyamin","sequence":"additional","affiliation":[{"name":"Higher School of Engineering and Economics, Institute of Industrial Management, Economics and Trade, Peter the Great St. Petersburg Polytechnic University, 195251 Saint-Petersburg, Russia"}]},{"given":"Evgenii","family":"Konnikov","sequence":"additional","affiliation":[{"name":"Higher School of Engineering and Economics, Institute of Industrial Management, Economics and Trade, Peter the Great St. Petersburg Polytechnic University, 195251 Saint-Petersburg, Russia"}]},{"given":"Elena","family":"Obukhova","sequence":"additional","affiliation":[{"name":"Higher School of Engineering and Economics, Institute of Industrial Management, Economics and Trade, Peter the Great St. Petersburg Polytechnic University, 195251 Saint-Petersburg, Russia"}]},{"given":"Gleb","family":"Golikov","sequence":"additional","affiliation":[{"name":"Higher School of Engineering and Economics, Institute of Industrial Management, Economics and Trade, Peter the Great St. Petersburg Polytechnic University, 195251 Saint-Petersburg, Russia"}]},{"given":"Prokhor","family":"Polyakov","sequence":"additional","affiliation":[{"name":"Higher School of Engineering and Economics, Institute of Industrial Management, Economics and Trade, Peter the Great St. Petersburg Polytechnic University, 195251 Saint-Petersburg, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"100128","DOI":"10.1016\/j.nlp.2025.100128","article-title":"A Survey on Chatbots and Large Language Models: Testing and Evaluation Techniques","volume":"10","author":"Singh","year":"2025","journal-title":"Nat. Lang. Process. J."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.jor.2024.12.039","article-title":"Large Language Models in Orthopedics: An Exploratory Research Trend Analysis and Machine Learning Classification","volume":"66","author":"Garcia","year":"2025","journal-title":"J. Orthop."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"100205","DOI":"10.1016\/j.mcpdig.2025.100205","article-title":"A Systematic Review of Natural Language Processing Techniques for Early Detection of Cognitive Impairment","volume":"3","author":"Shankar","year":"2025","journal-title":"Mayo Clin. Proc. Digit. Health"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"e54653","DOI":"10.2196\/54653","article-title":"Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing\u2013Assisted Intelligent Systematic Literature Review System","volume":"12","author":"Manion","year":"2024","journal-title":"JMIR Med. Inform."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1016\/j.procs.2023.12.087","article-title":"An Enhanced Research Productivity Monitoring System for Higher Education Institutions (HEI\u2019s) with Natural Language Processing (NLP)","volume":"230","author":"Regla","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"103479","DOI":"10.1016\/j.scs.2021.103479","article-title":"A Machine Learning Approach for Integration of Spatial Development Plans Based on Natural Language Processing","volume":"76","author":"Kaczmarek","year":"2022","journal-title":"Sustain. Cities Soc."},{"key":"ref_7","first-page":"100310","article-title":"Automating Materiality Assessment with a Data-Driven Document-Based Approach","volume":"5","author":"Francia","year":"2025","journal-title":"Int. J. Inf. Manag. Data Insights"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"114269","DOI":"10.1016\/j.dss.2024.114269","article-title":"Selecting Textual Analysis Tools to Classify Sustainability Information in Corporate Reporting","volume":"183","author":"Maibaum","year":"2024","journal-title":"Decis. Support Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1016\/j.jum.2022.05.004","article-title":"Artificial Intelligence, Institutions, and Resilience: Prospects and Provocations for Cities","volume":"11","author":"Schintler","year":"2022","journal-title":"J. Urban Manag."},{"key":"ref_10","first-page":"102","article-title":"Big Data Techniques in Auditing Research and Practice: Current Trends and Future Opportunities","volume":"40","author":"Gepp","year":"2018","journal-title":"J. Account. Lit."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A Mathematical Theory of Communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_12","unstructured":"Rosen, R. (1991). Life Itself: A Comprehensive Inquiry into the Nature, Origin, and Fabrication of Life, Columbia University Press."},{"key":"ref_13","unstructured":"Gershenfeld, N. (2000). The Physics of Information Technology, Cambridge University Press."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1147\/rd.53.0183","article-title":"Irreversibility and Heat Generation in the Computing Process","volume":"5","author":"Landauer","year":"1961","journal-title":"IBM J. Res. Dev."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bekenstein, J.D. (2020). Black Holes and the Second Law. Jacob Bekenstein: The Conservative Revolutionary, World Scientific.","DOI":"10.1142\/9789811203961_0022"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ellerman, D. (2021). Introduction to Logical Entropy and Its Relationship to Shannon Entropy. arXiv.","DOI":"10.2139\/ssrn.3978011"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Xu, P., Sayyari, Y., and Butt, S.I. (2022). Logical Entropy of Information Sources. Entropy, 24.","DOI":"10.3390\/e24091174"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"\u00c7engel, Y.A. (2023). A Concise Account of Information as Meaning Ascribed to Symbols and Its Association with Conscious Mind. Entropy, 25.","DOI":"10.3390\/e25010177"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Manzotti, R. (2025). A Deflationary Account of Information in Terms of Probability. Entropy, 27.","DOI":"10.3390\/e27050514"},{"key":"ref_20","unstructured":"Wang, S., Zhang, T., and Xi, B. (2011). Information Computing and Applications, Springer."},{"key":"ref_21","unstructured":"McKenzie, D.P. (1977). Plate Tectonics and Its Relationship to the Evolution of Ideas in the Geological Sciences. Daedalus, 97\u2013124."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1134\/S0006350916010061","article-title":"Natural-Constructive Approach to Modeling the Cognitive Process","volume":"61","author":"Chernavskaya","year":"2016","journal-title":"Biophysics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"111359","DOI":"10.1016\/j.jss.2022.111359","article-title":"Data Management for Production Quality Deep Learning Models: Challenges and Solutions","volume":"191","author":"Munappy","year":"2022","journal-title":"J. Syst. Softw."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"102510","DOI":"10.1016\/j.lrp.2025.102510","article-title":"Executive Training as a Turning Point in Strategic Renewal Processes","volume":"58","author":"Nevalainen","year":"2025","journal-title":"Long Range Plann."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"528","DOI":"10.3724\/SP.J.1041.2020.00528","article-title":"Analysis of the Problem-Solving Strategies in Computer-Based Dynamic Assessment: The Extension and Application of Multilevel Mixture IRT Model","volume":"52","author":"Li","year":"2020","journal-title":"Acta Psychol. Sin."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3507900","article-title":"The Evolution of Topic Modeling","volume":"54","author":"Churchill","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"101582","DOI":"10.1016\/j.is.2020.101582","article-title":"A Review of Topic Modeling Methods","volume":"94","author":"Vayansky","year":"2020","journal-title":"Inf. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.patcog.2018.04.008","article-title":"Learning Bag-of-Embedded-Words Representations for Textual Information Retrieval","volume":"81","author":"Passalis","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"103782","DOI":"10.1016\/j.robot.2021.103782","article-title":"Modest-Vocabulary Loop-Closure Detection with Incremental Bag of Tracked Words","volume":"141","author":"Tsintotas","year":"2021","journal-title":"Rob. Auton. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.patrec.2020.03.003","article-title":"Improving FastText with Inverse Document Frequency of Subwords","volume":"133","author":"Choi","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1016\/j.eswa.2019.07.022","article-title":"Novel Term Weighting Schemes for Document Representation Based on Ranking of Terms and Fuzzy Logic with Semantic Relationship of Terms","volume":"137","author":"Lakshmi","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"110215","DOI":"10.1016\/j.knosys.2022.110215","article-title":"Supervised Term-Category Feature Weighting for Improved Text Classification","volume":"261","author":"Attieh","year":"2023","journal-title":"Knowl.-Based Syst."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"113401","DOI":"10.1016\/j.eswa.2020.113401","article-title":"Word2vec-Based Latent Semantic Analysis (W2V-LSA) for Topic Modeling: A Study on Blockchain Technology Trend Analysis","volume":"152","author":"Kim","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"102110","DOI":"10.1016\/j.datak.2022.102110","article-title":"Ontology-Based Semantic Retrieval of Documents Using Word2vec Model","volume":"144","author":"Sharma","year":"2023","journal-title":"Data Knowl. Eng."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1016\/j.procs.2024.11.126","article-title":"NLP and Topic Modeling with LDA, LSA, and NMF for Monitoring Psychosocial Well-Being in Monthly Surveys","volume":"251","author":"Rkia","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1016\/j.procs.2024.03.027","article-title":"Decision Support Model in Compiling Owner Estimate for Fmcgs Products from Various Marketplaces with Tf-Idf and Lsa-Based Clustering","volume":"234","author":"Indasari","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"114310","DOI":"10.1016\/j.dss.2024.114310","article-title":"Approaches to Improve Preprocessing for Latent Dirichlet Allocation Topic Modeling","volume":"185","author":"Zimmermann","year":"2024","journal-title":"Decis. Support Syst."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"15169","DOI":"10.1007\/s11042-018-6894-4","article-title":"Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, a Survey","volume":"78","author":"Jelodar","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_39","unstructured":"Grootendorst, M. (2022). BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.cosrev.2017.10.002","article-title":"The Evolution of Sentiment Analysis\u2014A Review of Research Topics, Venues, and Top Cited Papers","volume":"27","author":"Graziotin","year":"2018","journal-title":"Comput. Sci. Rev."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"110284","DOI":"10.1016\/j.engappai.2025.110284","article-title":"Validation and Extraction of Reliable Information through Automated Scraping and Natural Language Inference","volume":"147","author":"Shah","year":"2025","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"107094","DOI":"10.1016\/j.patcog.2019.107094","article-title":"Estimation of Ergodicity Limits of Bag-of-Words Modeling for Guaranteed Stochastic Convergence","volume":"99","author":"Ghalyan","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"109482","DOI":"10.1016\/j.patcog.2023.109482","article-title":"Capacitive Empirical Risk Function-Based Bag-of-Words and Pattern Classification Processes","volume":"139","author":"Ghalyan","year":"2023","journal-title":"Pattern Recognit."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"108987","DOI":"10.1016\/j.asoc.2022.108987","article-title":"Weighting Construction by Bag-of-Words with Similarity-Learning and Supervised Training for Classification Models in Court Text Documents","volume":"124","author":"Junior","year":"2022","journal-title":"Appl. Soft Comput."},{"key":"ref_45","first-page":"3395","article-title":"Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using Tf-Idf, Word2vec, and Bert","volume":"81","author":"Almazaydeh","year":"2024","journal-title":"Comput. Mater. Contin"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.ins.2018.10.006","article-title":"Multi-Co-Training for Document Classification Using Various Document Representations: TF\u2013IDF, LDA, and Doc2Vec","volume":"477","author":"Kim","year":"2019","journal-title":"Inf. Sci."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/j.procs.2024.10.237","article-title":"Application of the Bidirectional Long Short-Term Memory Method with Comparison of Word2Vec, GloVe, and FastText for Emotion Classification in Song Lyrics","volume":"245","author":"Shaday","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"e35945","DOI":"10.1016\/j.heliyon.2024.e35945","article-title":"Investigating Response Behavior through TF-IDF and Word2vec Text Analysis: A Case Study of PISA 2012 Problem-Solving Process Data","volume":"10","author":"Zhou","year":"2024","journal-title":"Heliyon"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1016\/j.procs.2023.10.548","article-title":"Philippine Court Case Summarizer Using Latent Semantic Analysis","volume":"227","author":"Sagum","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1016\/j.eswa.2019.03.001","article-title":"Latent Dirichlet Allocation (LDA) for Topic Modeling of the CFPB Consumer Complaints","volume":"127","author":"Bastani","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.procs.2019.11.277","article-title":"Latent Dirichlet Allocation (LDA) for Improving the Topic Modeling of the Official Bulletin of the Spanish State (BOE)","volume":"162","author":"Cobo","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_52","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"113111","DOI":"10.1016\/j.eswa.2019.113111","article-title":"Unusual Customer Response Identification and Visualization Based on Text Mining and Anomaly Detection","volume":"144","author":"Seo","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"113682","DOI":"10.1016\/j.eswa.2020.113682","article-title":"Enhancing Web Service Clustering Using Length Feature Weight Method for Service Description Document Vector Space Representation","volume":"161","author":"Agarwal","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1016\/j.istruc.2021.11.012","article-title":"Improved Arithmetic Optimization Algorithm and Its Application to Discrete Structural Optimization","volume":"35","author":"Kaveh","year":"2022","journal-title":"Structures"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"105436","DOI":"10.1016\/j.knosys.2019.105436","article-title":"Bag-of-Concepts Representation for Document Classification Based on Automatic Knowledge Acquisition from Probabilistic Knowledge Base","volume":"193","author":"Li","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"113609","DOI":"10.1016\/j.cma.2020.113609","article-title":"The Arithmetic Optimization Algorithm","volume":"376","author":"Abualigah","year":"2021","journal-title":"Comput. Methods Appl. Mech. Eng."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"R\u00f6der, M., Both, A., and Hinneburg, A. (2015, January 2\u20136). Exploring the Space of Topic Coherence Measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China.","DOI":"10.1145\/2684822.2685324"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/8\/197\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:15:55Z","timestamp":1760033755000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/8\/197"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,25]]},"references-count":58,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["bdcc9080197"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9080197","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,7,25]]}}}