{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:41:54Z","timestamp":1760060514586,"version":"build-2065373602"},"reference-count":28,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:00:00Z","timestamp":1756425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Research articles are valuable resources for Information Retrieval and Natural Language Processing (NLP) tasks, offering opportunities to analyze key components of scholarly content. This study investigates the presence of methodological terminology in psychology research over the past 30 years (1995\u20132024) by applying a novel NLP and Machine Learning pipeline to a large corpus of 85,452 abstracts, as well as the extent to which this terminology forms distinct thematic groupings. Combining glossary-based extraction, contextualized language model embeddings, and dual-mode clustering, this study offers a scalable framework for the exploration of methodological transparency in scientific text via deep semantic structures. A curated glossary of 365 method-related keywords served as a gold-standard reference for term identification, using direct and fuzzy string matching. Retrieved terms were encoded with SciBERT, averaging embeddings across contextual occurrences to produce unified vectors. These vectors were clustered using unsupervised and weighted unsupervised approaches, yielding six and ten clusters, respectively. Cluster composition was analyzed using weighted statistical measures to assess term importance within and across groups. A total of 78.16% of the examined abstracts contained glossary terms, with an average of 1.8 term per abstract, highlighting an increasing presence of methodological terminology in psychology and reflecting a shift toward greater transparency in research reporting. This work goes beyond the use of static vectors by incorporating contextual understanding in the examination of methodological terminology, while offering a scalable and generalizable approach to semantic analysis in scientific texts, with implications for meta-research, domain-specific lexicon development, and automated scientific knowledge discovery.<\/jats:p>","DOI":"10.3390\/bdcc9090224","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:18:15Z","timestamp":1756484295000},"page":"224","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Identifying Methodological Language in Psychology Abstracts: A Machine Learning Approach Using NLP and Embedding-Based Clustering"],"prefix":"10.3390","volume":"9","author":[{"given":"Konstantinos G.","family":"Stathakis","sequence":"first","affiliation":[{"name":"School of Science and Technology, International Hellenic University, 57001 Thessaloniki, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9361-8621","authenticated-orcid":false,"given":"George","family":"Papageorgiou","sequence":"additional","affiliation":[{"name":"School of Science and Technology, International Hellenic University, 57001 Thessaloniki, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8263-9024","authenticated-orcid":false,"given":"Christos","family":"Tjortjis","sequence":"additional","affiliation":[{"name":"School of Science and Technology, International Hellenic University, 57001 Thessaloniki, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1931","DOI":"10.1007\/s11192-018-2921-5","article-title":"Information Extraction from Scientific Articles: A Survey","volume":"117","author":"Nasar","year":"2018","journal-title":"Scientometrics"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3383","DOI":"10.1007\/s11837-021-04902-9","article-title":"Challenges and Advances in Information Extraction from Scientific Literature: A Review","volume":"73","author":"Hong","year":"2021","journal-title":"JOM"},{"key":"ref_3","unstructured":"American Psychological Association (2020). Publication Manual of the American Psychological Association: The Official Guide to APA Style, American Psychological Association. [7th ed.]."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1037\/a0021524","article-title":"Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect","volume":"100","author":"Bem","year":"2011","journal-title":"J. Personal. Soc. Psychol."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLoS Med., 2.","DOI":"10.1371\/journal.pmed.0020124"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"aac4716","DOI":"10.1126\/science.aac4716","article-title":"Estimating the Reproducibility of Psychological Science","volume":"349","author":"Collaboration","year":"2015","journal-title":"Science"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1027\/1864-9335\/a000178","article-title":"Investigating Variation in Replicability: A \u201cMany Labs\u201d Replication Project","volume":"45","author":"Klein","year":"2014","journal-title":"Soc. Psychol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"191375","DOI":"10.1098\/rsos.191375","article-title":"Raising the Value of Research Studies in Psychological Science by Increasing the Credibility of Research Reports: The Transparent Psi Project","volume":"10","author":"Kekecs","year":"2023","journal-title":"R. Soc. Open Sci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1080\/00031305.2018.1447512","article-title":"What Have We (Not) Learnt from Millions of Scientific Papers with P Values?","volume":"73","author":"Ioannidis","year":"2019","journal-title":"Am. Stat."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"e2208863120","DOI":"10.1073\/pnas.2208863120","article-title":"A Discipline-Wide Investigation of the Replicability of Psychology Papers over the Past Two Decades","volume":"120","author":"Youyou","year":"2023","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_11","first-page":"147","article-title":"Classifying Positive Results in Clinical Psychology Using Natural Language Processing","volume":"232","author":"Schiekiera","year":"2024","journal-title":"Z. Psychol."},{"key":"ref_12","first-page":"3","article-title":"How to Identify Hot Topics in Psychology Using Topic Modeling","volume":"226","author":"Bittermann","year":"2018","journal-title":"Z. Psychol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"48","DOI":"10.21037\/mhealth.2019.09.06","article-title":"Capturing the Trend of MHealth Research Using Text Mining","volume":"5","author":"Park","year":"2019","journal-title":"Mhealth"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"100831","DOI":"10.1016\/j.mex.2020.100831","article-title":"A Clustering Approach for Topic Filtering within Systematic Literature Reviews","volume":"7","author":"Ohrndorf","year":"2020","journal-title":"MethodsX"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"9699","DOI":"10.1007\/s11192-021-04069-9","article-title":"Mapping the Field of Psychology: Trends in Research Topics 1995\u20132015","volume":"126","author":"Wieczorek","year":"2021","journal-title":"Scientometrics"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Weng, M.-H., Wu, S., and Dyer, M. (2022). Identification and Visualization of Key Topics in Scientific Publications with Transformer-Based Language Models and Document Clustering Methods. Appl. Sci., 12.","DOI":"10.3390\/app122111220"},{"key":"ref_17","unstructured":"IEOM Society International (2022, January 26\u201328). Analysis of Research Trends in Artificial Intelligence and Healthcare Convergence Using Text Mining Techniques. Proceedings of the International Conference on Industrial Engineering and Operations Management, Rome, Italy."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1108\/FS-02-2023-0026","article-title":"Identifying Emerging Trends and Hot Topics through Intelligent Data Mining: The Case of Clinical Psychology and Psychotherapy","volume":"26","author":"Sokolova","year":"2024","journal-title":"Foresight"},{"key":"ref_19","unstructured":"(2024, August 15). National Center for Biotechnology Information (NCBI) PubMed, Available online: https:\/\/pubmed.ncbi.nlm.nih.gov\/."},{"key":"ref_20","unstructured":"Sayers, E. (2018). Entrez Programming Utilities Help."},{"key":"ref_21","unstructured":"Elsevier (2024, October 15). Scopus. Available online: https:\/\/www.scopus.com\/."},{"key":"ref_22","unstructured":"Elsevier (2024, August 15). Scopus Search API. Available online: https:\/\/dev.elsevier.com\/."},{"key":"ref_23","unstructured":"Ovid Technologies (Wolters Kluwer) (2024, October 15). APA PsycINFO. Available online: https:\/\/ovidsp.dc1.ovid.com\/ovid-new-a\/ovidweb.cgi."},{"key":"ref_24","unstructured":"Stathakis, K. (2025, May 26). Research Paper\u2014GitHub Repository. Available online: https:\/\/github.com\/KosStath\/nlp-ml-analysis-of-psych-methods."},{"key":"ref_25","unstructured":"Kwantlen Polytechnic University (2019). Glossary. Research Methods in Psychology, Kwantlen Polytechnic University. [4th ed.]."},{"key":"ref_26","unstructured":"Maricopa Open Press (2010). Glossary of Terms. Introduction to Statistics for Psychology, Maricopa Open Press."},{"key":"ref_27","unstructured":"Dalhousie University (2014). Key Terms for Psychological Research. Introduction to Psychology & Neuroscience, Dalhousie University."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1016\/j.ins.2018.08.018","article-title":"Clustering Based on Grid and Local Density with Priority-Based Expansion for Multi-Density Data","volume":"468","author":"Dong","year":"2018","journal-title":"Inf. Sci."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/9\/224\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:35:22Z","timestamp":1760034922000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/9\/224"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,29]]},"references-count":28,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["bdcc9090224"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9090224","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,8,29]]}}}