{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T23:26:40Z","timestamp":1777418800774,"version":"3.51.4"},"reference-count":33,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T00:00:00Z","timestamp":1770681600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003193","name":"Ministry of Education, Science, Research and Sport of the Slovak Republic","doi-asserted-by":"publisher","award":["KEGA 075TUKE-4\/2024"],"award-info":[{"award-number":["KEGA 075TUKE-4\/2024"]}],"id":[{"id":"10.13039\/501100003193","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003193","name":"Ministry of Education, Science, Research and Sport of the Slovak Republic","doi-asserted-by":"publisher","award":["KEGA 054TUKE-4\/2024"],"award-info":[{"award-number":["KEGA 054TUKE-4\/2024"]}],"id":[{"id":"10.13039\/501100003193","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005357","name":"Slovak Research and Development Agency","doi-asserted-by":"publisher","award":["APVV-22-0576"],"award-info":[{"award-number":["APVV-22-0576"]}],"id":[{"id":"10.13039\/501100005357","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This research presents a large-scale thematic analysis of 66,002 Slovak university thesis abstracts, aimed at identifying, categorizing, and visualizing research trends across multiple academic disciplines. Using BERTopic for unsupervised topic modeling with K-Means clustering, 3000 distinct thematic clusters were extracted through rigorous coherence optimization, with each topic characterized by representative keywords derived from class-based TF-IDF weighting. Text embeddings were generated using SlovakBERT-STS, a domain-adapted Slovak BERT model fine-tuned for semantic textual similarity, producing 768-dimensional vectors that enable precise computation of cosine similarity between topics, resulting in a 3000 \u00d7 3000 topic similarity matrix. The optimal topic count was determined through systematic evaluation of K values ranging from 1000 to 10,000, with K = 3000 identified as the optimal configuration based on coherence elbow analysis, yielding a mean coherence score of 0.433. Thematic relationships were visualized through Multidimensional Scaling (MDS) projection to 3-D space, where convex hull geometries reveal semantic boundaries and topic separability. The methodology incorporates dynamic stopword filtering, Stanza-based lemmatization for Slovak morphology, and UMAP dimensionality reduction, achieving a balanced distribution of approximately 22 abstracts per topic. Results demonstrate that fine-grained topic models with 3000 clusters can extract meaningful semantic structure from multi-domain, morphologically complex Slovak academic corpora, despite inherent coherence constraints. The reproducible pipeline provides a framework for large-scale topic discovery, coherence-driven optimization, and geometric visualization of thematic relationships in academic text collections.<\/jats:p>","DOI":"10.3390\/info17020180","type":"journal-article","created":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T15:23:14Z","timestamp":1770736994000},"page":"180","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Convex Hull-Based Topic Similarity Mapping in Multidimensional Data"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-8441-4959","authenticated-orcid":false,"given":"Mat\u00fa\u0161","family":"Pohorenec","sequence":"first","affiliation":[{"name":"Faculty of Civil Engineering, Institute of Construction Technology, Economics and Management, Technical University of Kosice, 042 00 Kosice, Slovakia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6038-2993","authenticated-orcid":false,"given":"Vladislav","family":"Vavr\u00e1k","sequence":"additional","affiliation":[{"name":"Faculty of Mining, Ecology, Process Control and Geotechnologies, Institute of Logistics and Transport, Technical University of Kosice, 042 00 Kosice, Slovakia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6310-6046","authenticated-orcid":false,"given":"Annam\u00e1ria","family":"Beh\u00fanov\u00e1","sequence":"additional","affiliation":[{"name":"Faculty of Mining, Ecology, Process Control and Geotechnologies, Institute of Logistics and Transport, Technical University of Kosice, 042 00 Kosice, Slovakia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8130-9664","authenticated-orcid":false,"given":"Marcel","family":"Beh\u00fan","sequence":"additional","affiliation":[{"name":"Faculty of Mining, Ecology, Process Control and Geotechnologies, Institute of Earth Resources, Technical University of Kosice, 042 00 Kosice, Slovakia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6119-895X","authenticated-orcid":false,"given":"Michal","family":"Ennert","sequence":"additional","affiliation":[{"name":"Department of Development, Operation and Integration of Information Systems, Institute of Computer Technology, Technical University of Kosice, 042 00 Kosice, Slovakia"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Egger, R., and Yu, J. (2022). A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Front. Sociol., 7.","DOI":"10.3389\/fsoc.2022.886498"},{"key":"ref_2","first-page":"185","article-title":"Unsupervised machine learning approaches in nlp: A comparative study of topic modeling with bertopic and lda","volume":"12","author":"Sy","year":"2024","journal-title":"Int. J. Intell. Syst. Appl. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Albalawi, R., Yeap, T.-H., and Benyoucef, M. (2020). Using topic modeling methods for short-text data: A comparative analysis. Front. Artif. Intell., 3.","DOI":"10.3389\/frai.2020.00042"},{"key":"ref_4","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language under-standing. Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA."},{"key":"ref_5","unstructured":"Feng, F., Yang, Y., Cer, D., Arivazhagan, N., and Wang, W. (2020, January 5\u201310). Language-agnostic bert sentence embedding. Proceedings of the ACL 2020, Online."},{"key":"ref_6","unstructured":"Mesaros, P., Mandicak, T., Spisakova, M., Behunova, A., and Behun, M. (2021). The implementation factors of information and communi-cation technology in the life cycle costs of buildings. Appl. Sci., 11."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"71","DOI":"10.22306\/al.v5i3.97","article-title":"European railway infrastructure: A review","volume":"5","author":"Knapcikova","year":"2018","journal-title":"Acta Logist."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Hrehova, S., and Knapcikova, L. (2022). The study of machine learning assisted the design of selected composites properties. Appl. Sci., 12.","DOI":"10.3390\/app122110863"},{"key":"ref_9","first-page":"63","article-title":"Investigation of mechanical properties of recycled polyvinyl butyral after tensile test","volume":"4","author":"Knapcikova","year":"2018","journal-title":"Acta Technol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"042064","DOI":"10.1088\/1757-899X\/960\/4\/042064","article-title":"Analysing the implementation motivations of BIM technology in construction project management","volume":"960","author":"Mesaros","year":"2020","journal-title":"IOP Conf. Ser. Mater. Sci. Eng."},{"key":"ref_11","first-page":"509","article-title":"Building information technology in economic sustainable construction project man-agement","volume":"22","author":"Mandicak","year":"2022","journal-title":"SGEM Int. Multidiscip. Sci. GeoConference\u2014EXPO Proc."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Mandicak, T., Mesaros, P., and Kanalikova, A. (2020, January 30\u201331). Digital and ICT competencies of employees for learning under COVID-19 pandemic at the faculty of civil engineering. Proceedings of the ICERI Proceedings, Seville, Spain.","DOI":"10.21125\/iceri.2020.0578"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Kliment, M., Pekarcikova, M., Trebuna, P., and Trebuna, M. (2021). Application of testbed 4.0 technology within the implementation of industry 4.0 in teaching methods of industrial engineering as well as industrial practice. Sustainability, 13.","DOI":"10.3390\/su13168963"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Trebuna, P., Pekarcikova, M., and Kliment, M. (2020). Testing the replenishment model strategy using software tecnomatix plant simulation. Innovations in Communication and Computing: 4th EAI International Conference on Management of Manufacturing Systems, Springer.","DOI":"10.1007\/978-3-030-34272-2_10"},{"key":"ref_15","first-page":"39","article-title":"Establishing security measures for the protection of production workers through UWB real-time localization technology","volume":"9","author":"Trebuna","year":"2023","journal-title":"Acta Technol."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Spodniak, M., Hovanec, M., and Korba, P. (2023). Jet engine turbine mechanical properties prediction by using progressive numerical methods. Aerospace, 10.","DOI":"10.3390\/aerospace10110937"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"e26041","DOI":"10.1016\/j.heliyon.2024.e26041","article-title":"A novel method for the natural frequency estimation of the jet engine turbine blades based on its dimensions","volume":"10","author":"Spodniak","year":"2024","journal-title":"Heliyon"},{"key":"ref_18","first-page":"175","article-title":"Aircraft brake temperature from a safety point of view","volume":"94","author":"Korba","year":"2017","journal-title":"Sci. J. Silesian Univ. Technol. Ser. Transp."},{"key":"ref_19","unstructured":"Angelov, D. (2020). Top2Vec: Distributed representations of topics. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1427","DOI":"10.1109\/TKDE.2020.2992485","article-title":"Short text topic modeling techniques, applications, and performance: A survey","volume":"34","author":"Qiang","year":"2022","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_21","first-page":"14321","article-title":"Short text topic modeling with g-seanmf and semantic aggregation","volume":"82","author":"Wang","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_22","unstructured":"Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., and Xu, J. (2016, January 1\u20135). A new model for short text topic modeling using word embeddings. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzman, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020, January 5\u201310). Unsupervised cross-lingual representation learning at scale. Proceedings of the ACL 2020, Online.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019, January 3\u20137). Sentence-bert: Sentence embeddings using siamese bert-networks. Proceedings of the EMNLP-IJCNLP 2019, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wu, X., Li, C., Zhu, Y., and Miao, Y. (2020, January 16\u201320). Short text topic modeling with topic distribution quantization and negative sampling decoder. Proceedings of the EMNLP 2020, Online.","DOI":"10.18653\/v1\/2020.emnlp-main.138"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1162\/tacl_a_00325","article-title":"Topic modeling in embedding spaces","volume":"8","author":"Dieng","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_27","unstructured":"Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"205","DOI":"10.21105\/joss.00205","article-title":"hdbscan: Hierarchical density based clustering","volume":"2","author":"McInnes","year":"2017","journal-title":"J. Open Source Softw."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Pikuliak, M., Grivalsk\u00fd, \u0160., Kon\u00f4pka, M., Bl\u0161t\u00e1k, M., Tamajka, M., Bachrat\u00fd, V., \u0160imko, M., Bal\u00e1\u017eik, P., Trnka, M., and Uhl\u00e1rik, F. (2021, January 6\u201311). Slovakbert: Slovak language model and its evaluation. Proceedings of the 2021 Conference on Computational Linguistics, Online.","DOI":"10.18653\/v1\/2022.findings-emnlp.530"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Pikuliak, M., Grivalsky, S., Konopka, M., Blstak, M., Tamajka, M., Bachraty, V., Simko, M., Balazik, P., Trnka, M., and Uhlarik, F. (2022, January 7\u201311). Slovakbert: Slovak masked language model. Proceedings of the Findings of EMNLP 2022, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.findings-emnlp.530"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Catak, F.O., and Kuzlu, M. (2024). Uncertainty Quantification in Large Language Models Through Convex Hull Analysis. arXiv.","DOI":"10.1007\/s44163-024-00200-w"},{"key":"ref_32","unstructured":"Werling, M., and Moitra, A. (2020, January 13\u201318). Anchor-based topic modeling: Improving interpretability with convex hull methods. Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bianchi, F., Terragni, S., Hovy, D., Nozza, D., and Fersini, E. (2021, January 19\u201320). Cross-lingual contextualized topic models with zero-shot learning. Proceedings of the EACL 2021, Online.","DOI":"10.18653\/v1\/2021.eacl-main.143"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/2\/180\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T15:35:41Z","timestamp":1770737741000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/2\/180"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,10]]},"references-count":33,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["info17020180"],"URL":"https:\/\/doi.org\/10.3390\/info17020180","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,10]]}}}