{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T12:52:14Z","timestamp":1765889534700,"version":"build-2065373602"},"reference-count":18,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,8,26]],"date-time":"2025-08-26T00:00:00Z","timestamp":1756166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Background: Large language models (LLMs) such as ChatGPT have evolved rapidly, with notable improvements in coherence, factual accuracy, and contextual relevance. However, their academic and clinical applicability remains under scrutiny. This study evaluates the temporal performance evolution of LLMs by comparing earlier model outputs (GPT-3.5 and GPT-4.0) with ChatGPT-4.5 across three domains: aesthetic surgery counseling, an academic discussion base of thumb arthritis, and a systematic literature review. Methods: We replicated the methodologies of three previously published studies using identical prompts in ChatGPT-4.5. Each output was assessed against its predecessor using a nine-domain Likert-based rubric measuring factual accuracy, completeness, reference quality, clarity, clinical insight, scientific reasoning, bias avoidance, utility, and interactivity. Expert reviewers in plastic and reconstructive surgery independently scored and compared model outputs across versions. Results: ChatGPT-4.5 outperformed earlier versions across all domains. Reference quality improved most significantly (a score increase of +4.5), followed by factual accuracy (+2.5), scientific reasoning (+2.5), and utility (+2.5). In aesthetic surgery counseling, GPT-3.5 produced generic responses lacking clinical detail, whereas ChatGPT-4.5 offered tailored, structured, and psychologically sensitive advice. In academic writing, ChatGPT-4.5 eliminated reference hallucination, correctly applied evidence hierarchies, and demonstrated advanced reasoning. In the literature review, recall remained suboptimal, but precision, citation accuracy, and contextual depth improved substantially. Conclusion: ChatGPT-4.5 represents a major step forward in LLM capability, particularly in generating trustworthy academic and clinical content. While not yet suitable as a standalone decision-making tool, its outputs now support research planning and early-stage manuscript preparation. Persistent limitations include information recall and interpretive flexibility. Continued validation is essential to ensure ethical, effective use in scientific workflows.<\/jats:p>","DOI":"10.3390\/informatics12030086","type":"journal-article","created":{"date-parts":[[2025,8,26]],"date-time":"2025-08-26T14:43:18Z","timestamp":1756219398000},"page":"86","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["The Temporal Evolution of Large Language Model Performance: A Comparative Analysis of Past and Current Outputs in Scientific and Medical Research"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5444-8925","authenticated-orcid":false,"given":"Ishith","family":"Seth","sequence":"first","affiliation":[{"name":"Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia"},{"name":"Department of Plastic and Reconstructive Surgery, Austin Health, Heidelberg, VIC 3084, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8621-8216","authenticated-orcid":false,"given":"Gianluca","family":"Marcaccini","sequence":"additional","affiliation":[{"name":"Department of Plastic and Reconstructive Surgery, University of Siena, 53100 Siena, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9647-5180","authenticated-orcid":false,"given":"Bryan","family":"Lim","sequence":"additional","affiliation":[{"name":"Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-3919-904X","authenticated-orcid":false,"given":"Jennifer","family":"Novo","sequence":"additional","affiliation":[{"name":"Faculty of Medicine and Surgery, The University of Notre Dame, Chippendale, NSW 2008, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5130-8628","authenticated-orcid":false,"given":"Stephen","family":"Bacchi","sequence":"additional","affiliation":[{"name":"Harvard Medical School, Massachusetts General Hospital, Boston, MA 02144, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8396-095X","authenticated-orcid":false,"given":"Roberto","family":"Cuomo","sequence":"additional","affiliation":[{"name":"Department of Plastic and Reconstructive Surgery, University of Siena, 53100 Siena, Italy"}]},{"given":"Richard J.","family":"Ross","sequence":"additional","affiliation":[{"name":"Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia"},{"name":"Department of Plastic and Reconstructive Surgery, Austin Health, Heidelberg, VIC 3084, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4092-182X","authenticated-orcid":false,"given":"Warren M.","family":"Rozen","sequence":"additional","affiliation":[{"name":"Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia"},{"name":"Department of Plastic and Reconstructive Surgery, Austin Health, Heidelberg, VIC 3084, Australia"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3701","DOI":"10.1097\/JS9.0000000000001312","article-title":"ChatGPT in medicine: Prospects and challenges: A review article","volume":"110","author":"Tan","year":"2024","journal-title":"Int. Surg. J."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tangsrivimol, J.A., Darzidehkalani, E., Virk, H.U.H., Wang, Z., Egger, J., Wang, M., Hacking, S., Glicksberg, B.S., Strauss, M., and Krittanawong, C. (2025). Benefits, limits, and risks of ChatGPT in medicine. Front. Artif. Intell., 8.","DOI":"10.3389\/frai.2025.1518049"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1985","DOI":"10.1007\/s00266-023-03338-7","article-title":"Aesthetic surgery advice and counseling from artificial intelligence: A rhinoplasty consultation with ChatGPT","volume":"47","author":"Xie","year":"2023","journal-title":"Aesthetic. Plast. Surg."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"e53164","DOI":"10.2196\/53164","article-title":"Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: Comparative analysis","volume":"26","author":"Chelli","year":"2024","journal-title":"J. Med. Internet Res."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sallam, M. (2023). ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare, 11.","DOI":"10.3390\/healthcare11060887"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"e4999","DOI":"10.1097\/GOX.0000000000004999","article-title":"Artificial or augmented authorship? A conversation with a chatbot on base of thumb arthritis","volume":"11","author":"Seth","year":"2023","journal-title":"Plast. Reconstr. Surg. Glob. Open"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Seth, I., Marcaccini, G., Lim, K., Castrechini, M., Cuomo, R., Ng, S.K.-H., Ross, R.J., and Rozen, W.M. (2025). Management of Dupuytren\u2019s disease: A multi-centric comparative analysis between experienced hand surgeons versus artificial intelligence. Diagnostics, 15.","DOI":"10.3390\/diagnostics15050587"},{"key":"ref_8","first-page":"1","article-title":"Artificial intelligence versus human researcher performance for systematic literature searches: A study focusing on the surgical management of base of thumb arthritis","volume":"12","author":"Seth","year":"2025","journal-title":"Plast. Aesthetic Res."},{"key":"ref_9","first-page":"187","article-title":"Progress, challenges, threats and prospects of ChatGPT in science and education: How will AI impact the academic environment?","volume":"3","author":"Nematov","year":"2025","journal-title":"J. Adv. Artif. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1007\/s44313-025-00062-w","article-title":"Transforming hematological research documentation with large language models: An approach to scientific writing and data analysis","volume":"60","author":"Yang","year":"2025","journal-title":"Blood Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"S178","DOI":"10.1055\/s-0044-1800801","article-title":"Navigating artificial intelligence in scientific manuscript writing: Tips and traps","volume":"35","author":"Kumar","year":"2025","journal-title":"Indian J. Radiol. Imaging."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Marcaccini, G., Seth, I., Xie, Y., Susini, P., Pozzi, M., Cuomo, R., and Rozen, W.M. (2025). Breaking bones, breaking barriers: ChatGPT, DeepSeek, and Gemini in hand fracture management. J. Clin. Med., 14.","DOI":"10.3390\/jcm14061983"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"On, S.W., Cho, S.W., Park, S.Y., Ha, J.-W., Yi, S.-M., Park, I.-Y., Byun, S.-H., and Yang, B.-E. (2025). Chat generative pre-trained transformer (ChatGPT) in oral and maxillofacial surgery: A narrative review on its research applications and limitations. J. Clin. Med., 14.","DOI":"10.3390\/jcm14041363"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, J., Shue, K., Liu, L., and Hu, G. (2025). Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-95233-1"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"532","DOI":"10.18203\/2320-6012.ijrms20244167","article-title":"Artificial intelligence in scientific writing: Opportunities and ethical considerations","volume":"13","author":"Sharma","year":"2024","journal-title":"Int. J. Res. Med. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1001\/jama.2024.21700","article-title":"Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review","volume":"333","author":"Bedi","year":"2025","journal-title":"JAMA"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Shool, S., Adimi, S., Amleshi, R.S., Bitaraf, E., Golpira, R., and Tara, M. (2025). A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med. Inform. Decis. Mak., 25.","DOI":"10.1186\/s12911-025-02954-4"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/12\/3\/86\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:33:03Z","timestamp":1760034783000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/12\/3\/86"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,26]]},"references-count":18,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["informatics12030086"],"URL":"https:\/\/doi.org\/10.3390\/informatics12030086","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2025,8,26]]}}}