{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T00:36:12Z","timestamp":1773189372189,"version":"3.50.1"},"reference-count":49,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T00:00:00Z","timestamp":1769731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>The increasing volume and diversity of scientific publications poses challenges for scalable and interpretable topic discovery and automated document categorization. This study proposes an integrated framework that combines probabilistic topic modeling with supervised classification to support large-scale scientific literature analysis. Using 3689 abstracts from the Journal of Forensic Sciences (2009\u20132022), Latent Dirichlet Allocation (LDA) is applied to uncover latent thematic structures, assess topic diagnosticity across forensic disciplines, and analyze temporal research trends. Bayesian model selection with repeated resampling identifies a stable topic resolution, with the number of topics T lying in the range 83\u201388, yielding semantically coherent and discipline-aligned topics. The resulting document\u2013topic representations are then used for supervised abstract classification. Across multiple models and resampling scenarios, the strongest and most stable performance is achieved under a Grouped Category configuration. In particular, XGBoost attains an Accuracy of 0.754 and a Macro-averaged F1 score of 0.737 at T=88, with comparable results at neighboring topic counts, indicating robustness to topic granularity. Overall, the proposed framework provides a reproducible, interpretable, and computationally efficient pipeline for literature organization, trend analysis, and metadata enhancement in scientific domains.<\/jats:p>","DOI":"10.3390\/informatics13020024","type":"journal-article","created":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T14:35:46Z","timestamp":1769783746000},"page":"24","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Exploring Scientific Literature Using Topic Modeling: A Practical Framework for Discovery and Classification"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1424-5589","authenticated-orcid":false,"given":"Amir","family":"Alipour Yengejeh","sequence":"first","affiliation":[{"name":"Department of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7276-155X","authenticated-orcid":false,"given":"Larry","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA"},{"name":"National Center for Forensic Science, University of Central Florida, P.O. Box 162367, Orlando, FL 32816, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0472-3025","authenticated-orcid":false,"given":"Candice M.","family":"Bridge","sequence":"additional","affiliation":[{"name":"National Center for Forensic Science, University of Central Florida, P.O. Box 162367, Orlando, FL 32816, USA"},{"name":"Department of Chemistry, University of Central Florida, Orlando, FL 32816, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8967-0593","authenticated-orcid":false,"given":"Chandra","family":"Kundu","sequence":"additional","affiliation":[{"name":"Department of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,30]]},"reference":[{"key":"ref_1","first-page":"993","article-title":"Latent Dirichlet Allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5228","DOI":"10.1073\/pnas.0307752101","article-title":"Finding scientific topics","volume":"101","author":"Griffiths","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_3","unstructured":"Ponweiser, M. (2012). Latent Dirichlet Allocation in R, WU Vienna University of Economics and Business. Theses\/Institute for Statistics and Mathematics."},{"key":"ref_4","first-page":"1","article-title":"topicmodels: An R Package for Fitting Topic Models","volume":"40","author":"Hornik","year":"2011","journal-title":"J. Stat. Softw."},{"key":"ref_5","unstructured":"Gatti, C.J., Brooks, J.D., and Nurre, S.G. (2015). A historical analysis of the field of OR\/MS using topic models. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.trc.2017.01.013","article-title":"Discovering themes and trends in transportation research using topic modeling","volume":"77","author":"Sun","year":"2017","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1016\/j.cie.2019.06.010","article-title":"Analyzing scientific research topics in manufacturing field using a topic model","volume":"135","author":"Xiong","year":"2019","journal-title":"Comput. Ind. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"120114","DOI":"10.1016\/j.eswa.2023.120114","article-title":"Discovering topics and trends in the field of Artificial Intelligence: Using LDA topic modeling","volume":"225","author":"Yu","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Shen, F., Mojarad, M.R., Li, D., Liu, S., Tao, C., Yu, Y., and Liu, H. (2018). Systematic identification of latent disease-gene associations from PubMed articles. PLoS ONE, 18.","DOI":"10.1371\/journal.pone.0191568"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Madz\u00edk, P., and Fal\u00e1t, L. (2022). State-of-the-art on analytic hierarchy process in the last 40 years: Literature review based on Latent Dirichlet Allocation topic modelling. PLoS ONE, 17.","DOI":"10.1371\/journal.pone.0268777"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1186\/s40537-019-0255-7","article-title":"Smart literature review: A practical topic modelling approach to exploratory literature review","volume":"6","author":"Asmussen","year":"2019","journal-title":"J. Big Data"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1186\/s40537-022-00605-3","article-title":"An intelligent literature review: Adopting inductive approach to define machine learning applications in the clinical domain","volume":"9","author":"Sabharwal","year":"2022","journal-title":"J. Big Data"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1186\/s40537-025-01068-y","article-title":"A computational analysis of aspect-based sentiment analysis research through bibliometric mapping and topic modeling","volume":"12","author":"Chen","year":"2025","journal-title":"J. Big Data"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/s40537-021-00551-6","article-title":"Modeling the public attitude towards organic foods: A big data and text mining approach","volume":"9","author":"Singh","year":"2022","journal-title":"J. Big Data"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Colangelo, M.T., Guizzardi, S., and Galli, C. (2024). Topic modeling as a tool to identify research diversity: A study across dental disciplines. Metrics, 1.","DOI":"10.20944\/preprints202408.1649.v1"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Rejeb, A., Rejeb, K., Molavi, H., and Keogh, J.G. (2025). A Data-Driven Topic Modeling Analysis of Blockchain in Food Supply Chain Traceability. Information, 16.","DOI":"10.3390\/info16121096"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Debortoli, S., M\u00fcller, O., Junglas, I., and vom Brocke, J. (2016). Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial, Swansea University. Communications of the Association for Information Systems.","DOI":"10.17705\/1CAIS.03907"},{"key":"ref_18","first-page":"e2","article-title":"Topic Modeling: A Comprehensive Review","volume":"20","author":"Kherwa","year":"2019","journal-title":"EAI Endorsed Trans. Scalable Inf. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2551","DOI":"10.1007\/s41060-024-00610-0","article-title":"Dynamic topic modelling for exploring the scientific literature on coronavirus: An unsupervised labelling technique","volume":"20","author":"Corcho","year":"2025","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1007\/s10462-023-10661-7","article-title":"A Survey on Neural Topic Models: Methods, Applications, and Challenges","volume":"57","author":"Wu","year":"2024","journal-title":"Artif. Intell. Rev."},{"key":"ref_21","unstructured":"Grootendorst, M. (2022). BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Galli, C., Cusano, C., Meleti, M., Donos, N., and Calciolari, E. (2024). Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings. Metrics, 1.","DOI":"10.20944\/preprints202407.2198.v1"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pham, C.M., Hoyle, A., Sun, S., Resnik, P., and Iyyer, M. (2024). TopicGPT: A Prompt-Based Topic Modeling Framework. arXiv.","DOI":"10.18653\/v1\/2024.naacl-long.164"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Doi, T., Isonuma, M., and Yanaka, H. (2024, January 11\u201316). Topic modeling for short texts with large language models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student ResearchWorkshop), Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.acl-srw.3"},{"key":"ref_25","unstructured":"Doi, T., Isonuma, M., and Yanaka, H. (2024). A Comprehensive Evaluation of Large Language Models for Topic Modeling. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s00799-025-00429-5","article-title":"Toward purpose-oriented topic model evaluation enabled by large language models","volume":"26","author":"Tan","year":"2025","journal-title":"Int. J. Digit. Libr."},{"key":"ref_27","unstructured":"Mu, Y., Dong, C., Bontcheva, K., and Song, X. (2024). Large language models as alternatives to topic modeling. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L.E., and Brown, D.E. (2019). Text Classification Algorithms: A Survey. Information, 10.","DOI":"10.3390\/info10040150"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer. [2nd ed.].","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1145\/2133806.2133826","article-title":"Probabilistic Topic Models","volume":"55","author":"Blei","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1007\/s41133-020-00032-0","article-title":"A comparative analysis of logistic regression, random forest and KNN models for text classification","volume":"5","author":"Shah","year":"2020","journal-title":"Augment. Hum. Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_34","unstructured":"Zhang, Q. (2020, January 27\u201329). The text classification of theft crime based on TF-IDF and XGBoost model. Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China."},{"key":"ref_35","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017, January 4\u20139). LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lubis, A.R., Prayudani, S., Fatmi, Y., and Nugroho, O. (2022, January 22\u201323). Classifying news based on Indonesian news using LightGBM. Proceedings of the 2022 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia.","DOI":"10.1109\/CENIM56801.2022.10037401"},{"key":"ref_37","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic Minority Over-sampling Technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"21631","DOI":"10.1038\/s41598-025-05791-7","article-title":"A Comprehensive Evaluation of Oversampling Techniques for Enhancing Text Classification Performance","volume":"15","author":"Taskiran","year":"2025","journal-title":"Sci. Rep."},{"key":"ref_40","first-page":"37","article-title":"Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation","volume":"2","author":"Powers","year":"2011","journal-title":"J. Mach. Learn. Technol."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.ipm.2009.03.002","article-title":"A Systematic Analysis of Performance Measures for Classification Tasks","volume":"45","author":"Sokolova","year":"2009","journal-title":"Inf. Process. Manag."},{"key":"ref_42","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Minneapolis, MN, USA."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Beltagy, I., Lo, K., and Cohan, A. (2019). SciBERT: A Pretrained Language Model for Scientific Text. arXiv.","DOI":"10.18653\/v1\/D19-1371"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1162\/tacl_a_00325","article-title":"Topic Modeling in Embedding Spaces","volume":"8","author":"Dieng","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Al Azher, I., Reddy, V.D., Akella, A.P., and Alhoori, H. (2025, January 15\u201319). LimTopic: LLM-Based Topic Modeling and Text Summarization for Analyzing Scientific Articles. Proceedings of the 24th ACM\/IEEE Joint Conference on Digital Libraries (JCDL), Hong Kong, China.","DOI":"10.1145\/3677389.3702605"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Rodionov, D., Konnikov, E., Golikov, G., and Yakob, P. (2026). Structural\u2013Semantic Term Weighting for Interpretable Topic Modeling with Higher Coherence and Lower Token Overlap. Information, 17.","DOI":"10.3390\/info17010022"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Glun\u010di\u0107, T., Bari\u0107, D., and Glun\u010di\u0107, M. (2025). VISTA: A Multi-View, Hierarchical, and Interpretable Framework for Robust Topic Modelling. Mach. Learn. Knowl. Extr., 7.","DOI":"10.3390\/make7040162"},{"key":"ref_48","first-page":"6750","article-title":"Precision\u2013Recall Balanced Topic Modelling","volume":"32","author":"Virtanen","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1214\/25-EJS2343","article-title":"Two-Step Mixed-Type Multivariate Bayesian Sparse Variable Selection with Shrinkage Priors","volume":"19","author":"Wang","year":"2025","journal-title":"Electron. J. Stat."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/2\/24\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T11:53:02Z","timestamp":1770810782000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/2\/24"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,30]]},"references-count":49,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["informatics13020024"],"URL":"https:\/\/doi.org\/10.3390\/informatics13020024","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,30]]}}}