{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T09:53:04Z","timestamp":1776765184841,"version":"3.51.2"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T00:00:00Z","timestamp":1776643200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T00:00:00Z","timestamp":1776729600000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EPJ Data Sci."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Ongoing breakthroughs in large language models (LLMs) are reshaping scholarly search and discovery interfaces. While these systems offer new possibilities for navigating scientific knowledge, they also raise concerns about fairness and representational bias rooted in the models\u2019 memorized training data. As LLMs are increasingly used to answer queries about researchers and research communities, their ability to accurately reconstruct scholarly coauthor lists becomes an important but underexamined issue. In this study, we investigate how memorization in LLMs affects the reconstruction of coauthor lists and whether this process reflects existing inequalities across academic disciplines and world regions. We evaluate three prominent models\u2014DeepSeek R1, Llama 4 Scout, and Mixtral 8\u00d77B\u2014by comparing their generated coauthor lists against bibliographic reference data. Our analysis reveals a systematic advantage for highly cited researchers, indicating that LLM memorization disproportionately favors already visible scholars. However, this pattern is not uniform: certain disciplines, such as Clinical Medicine, and some regions, including parts of Africa, exhibit more balanced reconstruction outcomes. These findings highlight both the risks and limitations of relying on LLM-generated relational knowledge in scholarly discovery contexts and emphasize the need for careful auditing of memorization-driven biases in LLM-based systems.<\/jats:p>","DOI":"10.1140\/epjds\/s13688-026-00647-0","type":"journal-article","created":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T16:09:42Z","timestamp":1776701382000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Remembering unequally: global and disciplinary bias in LLM reconstruction of scholarly coauthor lists"],"prefix":"10.1140","volume":"15","author":[{"given":"Ghazal","family":"Kalhor","sequence":"first","affiliation":[]},{"given":"Afra","family":"Mashhadi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,4,20]]},"reference":[{"key":"647_CR1","unstructured":"Agarwal S, Laradji IH, Charlin L, et al (2024) Litllm: a toolkit for scientific literature review. arXiv preprint arXiv:2402.01788"},{"key":"647_CR2","unstructured":"Alperin JP, Portenoy J, Demes K, et al (2024) An analysis of the suitability of openalex for bibliometric analyses. arXiv preprint arXiv:2404.17663"},{"key":"647_CR3","doi-asserted-by":"crossref","unstructured":"Bombieri M, Fiorini P, Ponzetto SP, et al (2024) Do llms dream of ontologies? arXiv preprint arXiv:2401.14931","DOI":"10.1145\/3725852"},{"issue":"1","key":"647_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s41109-019-0214-4","volume":"4","author":"G Bravo-Hermsdorff","year":"2019","unstructured":"Bravo-Hermsdorff G, Felso V, Ray E, et al. (2019) Gender and collaboration patterns in a temporal scientific authorship network. Appl Netw Sci 4(1):1\u201317","journal-title":"Appl Netw Sci"},{"key":"647_CR5","volume-title":"The eleventh international conference on learning representations","author":"N Carlini","year":"2022","unstructured":"Carlini N, Ippolito D, Jagielski M, et al. (2022) Quantifying memorization across neural language models. In: The eleventh international conference on learning representations"},{"key":"647_CR6","first-page":"2633","volume-title":"30th USENIX security symposium (USENIX Security 21)","author":"N Carlini","year":"2021","unstructured":"Carlini N, Tramer F, Wallace E, et al. (2021) Extracting training data from large language models. In: 30th USENIX security symposium (USENIX Security 21), pp\u00a02633\u20132650"},{"key":"647_CR7","doi-asserted-by":"publisher","unstructured":"Cholewiak SA, Ipeirotis P, Silva V, et al (2021) SCHOLARLY: simple access to Google Scholar authors and citation using Python. https:\/\/doi.org\/10.5281\/zenodo.5764801. https:\/\/github.com\/scholarly-python-package\/scholarly","DOI":"10.5281\/zenodo.5764801"},{"issue":"4","key":"647_CR8","doi-asserted-by":"publisher","first-page":"2475","DOI":"10.1007\/s11192-025-05293-3","volume":"130","author":"JH Culbert","year":"2025","unstructured":"Culbert JH, Hobert A, Jahn N, et al. (2025) Reference coverage analysis of openalex compared to web of science and scopus. Scientometrics 130(4):2475\u20132492","journal-title":"Scientometrics"},{"key":"647_CR9","unstructured":"DeepSeek-AI, Guo D, Yang D, et al (2025) Deepseek-r1: incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948. https:\/\/arxiv.org\/abs\/2501.12948"},{"issue":"2","key":"647_CR10","doi-asserted-by":"publisher","first-page":"1503","DOI":"10.1007\/s13132-022-00934-x","volume":"14","author":"S Diop","year":"2023","unstructured":"Diop S, Asongu SA (2023) Research productivity: trend and comparative analyses by regions and continents. J Knowl Econ 14(2):1503\u20131521","journal-title":"J Knowl Econ"},{"key":"647_CR11","unstructured":"Google (2024) Google maps platform documentation. https:\/\/developers.google.com\/maps\/documentation, accessed: 2025-05-23"},{"issue":"9","key":"647_CR12","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0256997","volume":"16","author":"N Grodzinski","year":"2021","unstructured":"Grodzinski N, Grodzinski B, Davies BM (2021) Can co-authorship networks be used to predict author research impact? A machine-learning based analysis within the field of degenerative cervical myelopathy research. PLoS ONE 16(9), Article ID e0256997","journal-title":"PLoS ONE"},{"key":"647_CR13","unstructured":"Haryanto CY (2024) Llassist: simple tools for automating literature review using large language models. arXiv preprint arXiv:2407.13993"},{"key":"647_CR14","doi-asserted-by":"publisher","first-page":"9266","DOI":"10.18653\/v1\/2025.naacl-long.469","volume-title":"Proceedings of the 2025 conference of the nations of the Americas chapter of the association for computational linguistics: human language technologies (volume 1: long papers)","author":"J Hayes","year":"2025","unstructured":"Hayes J, Swanberg M, Chaudhari H, et al. (2025) Measuring memorization in language models via probabilistic extraction. In: Proceedings of the 2025 conference of the nations of the Americas chapter of the association for computational linguistics: human language technologies (volume 1: long papers), pp\u00a09266\u20139291"},{"key":"647_CR15","first-page":"547","volume":"37","author":"P Jaccard","year":"1901","unstructured":"Jaccard P (1901) \u00c9tude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaud Sci Nat 37:547\u2013579","journal-title":"Bull Soc Vaud Sci Nat"},{"key":"647_CR16","unstructured":"Jiang AQ, Sablayrolles A, Roux A, et al (2024) Mixtral of experts. arXiv preprint arXiv:2401.04088. https:\/\/arxiv.org\/abs\/2401.04088"},{"issue":"1","key":"647_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1140\/epjds\/s13688-025-00555-9","volume":"14","author":"G Kalhor","year":"2025","unstructured":"Kalhor G, Ali S, Mashhadi A (2025) Measuring biases in ai-generated co-authorship networks. EPJ Data Sci 14(1):1\u201333","journal-title":"EPJ Data Sci"},{"issue":"1","key":"647_CR18","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1007\/s41109-022-00460-4","volume":"7","author":"G Kalhor","year":"2022","unstructured":"Kalhor G, Asadi Sarijalou A, Sharifi Sadr N, et al. (2022) A new insight to the analysis of co-authorship in Google scholar. Appl Netw Sci 7(1):21","journal-title":"Appl Netw Sci"},{"issue":"1","key":"647_CR19","first-page":"15","volume":"44","author":"JY Kung","year":"2023","unstructured":"Kung JY (2023) Elicit. J Can Health Libr Assoc 44(1):15","journal-title":"J Can Health Libr Assoc"},{"key":"647_CR20","unstructured":"Li Y, Chen L, Liu A, et al (2024) Chatcite: Llm agent with human workflow guidance for comparative literature summary. arXiv preprint arXiv:2403.02574"},{"issue":"1","key":"647_CR21","volume":"5","author":"C L\u00f3pez-Aguirre","year":"2022","unstructured":"L\u00f3pez-Aguirre C, Far\u00edas D (2022) The mirage of scientific productivity and how women are left behind: the Colombian case. Tapuya 5(1), Article ID 2037819","journal-title":"Tapuya"},{"key":"647_CR22","doi-asserted-by":"publisher","first-page":"1038","DOI":"10.18653\/v1\/2024.findings-acl.61","volume-title":"Findings of the association for computational linguistics ACL 2024. Association for computational linguistics, Bangkok, Thailand and virtual meeting","author":"T Luong","year":"2024","unstructured":"Luong T, Le TT, Ngo L, et al. (2024) Realistic evaluation of toxicity in large language models. In: Ku LW, Martins A, Srikumar V (eds) Findings of the association for computational linguistics ACL 2024. Association for computational linguistics, Bangkok, Thailand and virtual meeting, pp\u00a01038\u20131047. https:\/\/doi.org\/10.18653\/v1\/2024.findings-acl.61. https:\/\/aclanthology.org\/2024.findings-acl.61"},{"key":"647_CR23","first-page":"120","volume-title":"International workshop on complex networks","author":"M Macedo","year":"2023","unstructured":"Macedo M, Jaramillo AM, Menezes R (2023) Academic mobility as a driver of productivity: a gender-centric approach. In: International workshop on complex networks. Springer, Berlin, pp\u00a0120\u2013131"},{"key":"647_CR24","doi-asserted-by":"publisher","first-page":"157","DOI":"10.18653\/v1\/2022.acl-short.18","volume-title":"Proceedings of the 60th annual meeting of the association for computational linguistics (volume 2: short papers). Association for computational linguistics","author":"I Magar","year":"2022","unstructured":"Magar I, Schwartz R (2022) Data contamination: from memorization to exploitation. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the association for computational linguistics (volume 2: short papers). Association for computational linguistics, Dublin, Ireland, pp\u00a0157\u2013165. https:\/\/doi.org\/10.18653\/v1\/2022.acl-short.18. https:\/\/aclanthology.org\/2022.acl-short.18"},{"key":"647_CR25","unstructured":"Manvi R, Khanna S, Burke M, et al (2024) Large language models are geographically biased. arXiv preprint arXiv:2402.02680"},{"key":"647_CR26","unstructured":"Meta AI (2025) The llama 4 herd: the beginning of a new era of natively multimodal ai innovation Meta AI Blog. https:\/\/ai.meta.com\/blog\/llama-4-multimodal-intelligence\/, April 5, 2025"},{"key":"647_CR27","unstructured":"Nasr M, Carlini N, Hayase J, et al (2023) Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035"},{"key":"647_CR28","doi-asserted-by":"crossref","unstructured":"Nguyen TT, Wilson C, Dalins J (2023) Fine-tuning llama 2 large language models for detecting online sexual predatory chats and abusive texts. https:\/\/arxiv.org\/abs\/2308.14683. arXiv:2308.14683","DOI":"10.14428\/esann\/2024.ES2024-222"},{"key":"647_CR29","unstructured":"Priem J, Piwowar H, Orr R (2022) Openalex: a fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833"},{"key":"647_CR30","unstructured":"Ranaldi F, Zugarini A, Ranaldi L, et al (2025) Protoknowledge shapes behaviour of llms in downstream tasks: memorization and generalization with knowledge graphs. arXiv preprint arXiv:2505.15501. https:\/\/arxiv.org\/abs\/2505.15501"},{"key":"647_CR31","doi-asserted-by":"crossref","unstructured":"Richardeau G, Chali S, Le Merrer E, et al (2024) Llms prompted for graphs: hallucinations and generative capabilities. arXiv preprint arXiv:2409.00159. https:\/\/arxiv.org\/abs\/2409.00159","DOI":"10.1007\/s41109-025-00754-3"},{"key":"647_CR32","volume-title":"The eleventh international conference on learning representations","author":"A Saparov","year":"2023","unstructured":"Saparov A, He H (2023) Language models are greedy reasoners: a systematic formal analysis of chain-of-thought. In: The eleventh international conference on learning representations"},{"key":"647_CR33","unstructured":"Touvron H, Lavril T, Izacard G, et al (2023) Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971. https:\/\/api.semanticscholar.org\/CorpusID:257219404"},{"key":"647_CR34","unstructured":"Wang X, Antoniades A, Elazar Y, et al (2024) Generalization vs memorization: tracing language models\u2019 capabilities back to pretraining data. arXiv preprint arXiv:2407.14985"},{"key":"647_CR35","unstructured":"Wang X, Antoniades A, Elazar Y, et al (2025) Generalization v.s. memorization: tracing language models\u2019 capabilities back to pretraining data. arXiv preprint arXiv:2407.14985. https:\/\/arxiv.org\/abs\/2407.14985"}],"container-title":["EPJ Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1140\/epjds\/s13688-026-00647-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-026-00647-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-026-00647-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T09:23:12Z","timestamp":1776763392000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1140\/epjds\/s13688-026-00647-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,20]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["647"],"URL":"https:\/\/doi.org\/10.1140\/epjds\/s13688-026-00647-0","relation":{},"ISSN":["2193-1127"],"issn-type":[{"value":"2193-1127","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,20]]},"assertion":[{"value":"2 November 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 March 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"38"}}