{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,10]],"date-time":"2024-10-10T04:26:40Z","timestamp":1728534400957},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,10,9]],"date-time":"2024-10-09T00:00:00Z","timestamp":1728432000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,9]],"date-time":"2024-10-09T00:00:00Z","timestamp":1728432000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"crossref","award":["ANR-21-CE38-0002-01","ANR-21-CE38-0002-01"],"award-info":[{"award-number":["ANR-21-CE38-0002-01","ANR-21-CE38-0002-01"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Netw Sci"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Science is a collaborative endeavor. Yet, unlike co-authorship, interactions within and across teams are seldom reported in a structured way, making them hard to study at scale. We show that Large Language Models (LLMs) can solve this problem, vastly improving the efficiency and quality of network data collection. Our approach iteratively applies filtering with few-shot learning, allowing us to identify and categorize different types of relationships from text. We compare this approach to manual annotation and fuzzy matching using a corpus of digital laboratory notebooks, examining inference quality at the level of edges (recovering a single link), labels (recovering the relationship context) and at the whole-network level (recovering local and global network properties). Large Language Models perform impressively well at each of these tasks, with edge recall rate ranging from 0.8 for the highly contextual case of recovering the task allocation structure of teams from their unstructured attribution page to 0.9 for the more explicit case of retrieving the collaboration with other teams from direct mentions, showing a 32% improvement over a fuzzy matching approach. Beyond science, the flexibility of LLMs means that our approach can be extended broadly through minor prompt revision.<\/jats:p>","DOI":"10.1007\/s41109-024-00658-8","type":"journal-article","created":{"date-parts":[[2024,10,9]],"date-time":"2024-10-09T12:02:44Z","timestamp":1728475364000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Large language models recover scientific collaboration networks from text"],"prefix":"10.1007","volume":"9","author":[{"given":"Rathin","family":"Jeyaram","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert N","family":"Ward","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marc","family":"Santolini","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,10,9]]},"reference":[{"key":"658_CR1","unstructured":"Alizadeh M, Kubli M, Samei Z, Dehghani S, Bermeo JD, Korobeynikova M, Gilardi F (2023) Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks (arXiv:2307.02179). arXiv. http:\/\/arxiv.org\/abs\/2307.02179"},{"key":"658_CR2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.11072818","author":"L Blondel","year":"2024","unstructured":"Blondel L, Jeyaram R, Krishna A, Santolini M (2024) iGEM: a model system for team science and innovation. Zenodo. https:\/\/doi.org\/10.5281\/zenodo.11072818","journal-title":"Zenodo"},{"issue":"4","key":"658_CR3","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1177\/1948550619875149","volume":"11","author":"M Chmielewski","year":"2020","unstructured":"Chmielewski M, Kucker SC (2020) An MTurk Crisis? Shifts in Data Quality and the impact on study results. Social Psychol Personality Sci 11(4):464\u2013473. https:\/\/doi.org\/10.1177\/1948550619875149","journal-title":"Social Psychol Personality Sci"},{"issue":"11","key":"658_CR4","doi-asserted-by":"publisher","first-page":"1011","DOI":"10.1037\/h0024051","volume":"21","author":"DJ De Solla Price","year":"1966","unstructured":"De Solla Price DJ, Beaver D (1966) Collaboration in an invisible college. Am Psychol 21(11):1011\u20131018. https:\/\/doi.org\/10.1037\/h0024051","journal-title":"Am Psychol"},{"issue":"CSCW","key":"658_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3274312","volume":"2","author":"S Deri","year":"2018","unstructured":"Deri S, Rappaz J, Aiello M L, Quercia D (2018) Coloring in the links: capturing Social ties as they are perceived. Proc ACM Hum Comput Interact 2(CSCW):1\u201318. https:\/\/doi.org\/10.1145\/3274312","journal-title":"Proc ACM Hum Comput Interact"},{"key":"658_CR6","doi-asserted-by":"publisher","first-page":"200244","DOI":"10.1016\/j.iswa.2023.200244","volume":"19","author":"K Detroja","year":"2023","unstructured":"Detroja K, Bhensdadia CK, Bhatt BS (2023) A survey on relation extraction. Intell Syst Appl 19:200244. https:\/\/doi.org\/10.1016\/j.iswa.2023.200244","journal-title":"Intell Syst Appl"},{"issue":"6379","key":"658_CR7","doi-asserted-by":"publisher","first-page":"eaao0185","DOI":"10.1126\/science.aao0185","volume":"359","author":"S Fortunato","year":"2018","unstructured":"Fortunato S, Bergstrom CT, B\u00f6rner K, Evans JA, Helbing D, Milojevi\u0107 S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barab\u00e1si A-L (2018) Science of science. Science 359(6379):eaao0185. https:\/\/doi.org\/10.1126\/science.aao0185","journal-title":"Science"},{"issue":"30","key":"658_CR8","doi-asserted-by":"publisher","first-page":"e2305016120","DOI":"10.1073\/pnas.2305016120","volume":"120","author":"F Gilardi","year":"2023","unstructured":"Gilardi F, Alizadeh M, Kubli M (2023) ChatGPT outperforms crowd-workers for text-annotation tasks. Proc Natl Acad Sci 120(30):e2305016120. https:\/\/doi.org\/10.1073\/pnas.2305016120","journal-title":"Proc Natl Acad Sci"},{"key":"658_CR9","first-page":"100","volume":"82","author":"A Goel","year":"2023","unstructured":"Goel A, Gueta A, Gilon O, Liu C, Erell S, Nguyen LH, Hao X, Jaber B, Reddy S, Kartha R, Steiner J, Laish I, Feder A (2023) LLMs accelerate annotation for medical information extraction. Proc 3rd Mach Learn Health Symp 82:100. https:\/\/proceedings.mlr.press\/v225\/goel23a.html","journal-title":"Proc 3rd Mach Learn Health Symp"},{"issue":"4","key":"658_CR10","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1037\/amp0000319","volume":"73","author":"KL Hall","year":"2018","unstructured":"Hall KL, Vogel AL, Huang GC, Serrano KJ, Rice EL, Tsakraklides SP, Fiore SM (2018) The science of team science: a review of the empirical evidence and research gaps on collaboration in science. Am Psychol 73(4):532\u2013548. https:\/\/doi.org\/10.1037\/amp0000319","journal-title":"Am Psychol"},{"issue":"4","key":"658_CR11","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1111\/2041-210X.13545","volume":"12","author":"C Hoeppke","year":"2021","unstructured":"Hoeppke C, Simmons BI (2021) Maxnodf: an R package for fair and fast comparisons of nestedness between networks. Methods Ecol Evol 12(4):580\u2013585. https:\/\/doi.org\/10.1111\/2041-210X.13545","journal-title":"Methods Ecol Evol"},{"key":"658_CR12","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1017\/S0269888914000277","volume":"30","author":"R Irfan","year":"2015","unstructured":"Irfan R, King C, Grages D, Ewen S, Khan S, Madani S, Ko\u0142odziej J, Wang L, Chen D, Rayes A, Tziritas N, Xu C-Z, Zomaya A, Alzahrani A, Li H (2015) A survey on text mining in social networks. Knowl Eng Rev 30:157\u2013170. https:\/\/doi.org\/10.1017\/S0269888914000277","journal-title":"Knowl Eng Rev"},{"key":"658_CR13","doi-asserted-by":"publisher","unstructured":"Karjus A (2023) Machine-assisted mixed methods: Augmenting humanities and social sciences with artificial intelligence (arXiv:2309.14379). arXiv. https:\/\/doi.org\/10.48550\/arXiv.2309.14379","DOI":"10.48550\/arXiv.2309.14379"},{"key":"658_CR14","doi-asserted-by":"publisher","unstructured":"Kuckartz U, R\u00e4diker S (2021) Using MAXQDA for Mixed Methods Research. In the routledge reviewer\u2019s guide to mixed methods analysis, Routledge, pp 305\u2013318 https:\/\/doi.org\/10.4324\/9780203729434-26","DOI":"10.4324\/9780203729434-26"},{"issue":"5","key":"658_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3344548","volume":"52","author":"V Labatut","year":"2020","unstructured":"Labatut V, Bost X (2020) Extraction and analysis of fictional character networks: a Survey. ACM-CSUR 52(5):1\u201340. https:\/\/doi.org\/10.1145\/3344548","journal-title":"ACM-CSUR"},{"key":"658_CR16","doi-asserted-by":"publisher","unstructured":"Larivi\u00e8re V, Pontille D, Sugimoto CR (2020) Investigating the division of scientific labor using the Contributor Roles Taxonomy (CRediT). Quant Sci Stud 2(1):111\u2013128. https:\/\/doi.org\/10.1162\/qss_a_00097","DOI":"10.1162\/qss_a_00097"},{"issue":"2","key":"658_CR17","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1016\/j.socnet.2008.02.001","volume":"30","author":"E Lazega","year":"2008","unstructured":"Lazega E, Jourda M-T, Mounier L, Stofer R (2008) Catching up with big fish in the big pond? Multi-level network analysis through linked design. Social Networks 30(2):159\u2013176. https:\/\/doi.org\/10.1016\/j.socnet.2008.02.001","journal-title":"Social Networks"},{"key":"658_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.physrep.2019.04.001","volume":"813","author":"MS Mariani","year":"2019","unstructured":"Mariani MS, Ren Z-M, Bascompte J, Tessone CJ (2019) Nestedness in complex networks: Observation, emergence, and implications. Phys Rep 813:1\u201390. https:\/\/doi.org\/10.1016\/j.physrep.2019.04.001","journal-title":"Phys Rep"},{"issue":"1","key":"658_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.5334\/cstp.565","volume":"8","author":"C Masselot","year":"2023","unstructured":"Masselot C, Jeyaram R, Tackx R, Fernandez-Marquez JL, Grey F, Santolini M (2023) Collaboration and performance of Citizen Science projects addressing the Sustainable Development Goals. Citiz Science: Theory Pract 8(1):1. https:\/\/doi.org\/10.5334\/cstp.565","journal-title":"Citiz Science: Theory Pract"},{"issue":"6","key":"658_CR20","doi-asserted-by":"publisher","first-page":"1122","DOI":"10.1287\/mnsc.1110.1470","volume":"58","author":"A Oettl","year":"2012","unstructured":"Oettl A (2012) Reconceptualizing stars: scientist helpfulness and peer performance. Manage Sci 58(6):1122\u20131140. https:\/\/doi.org\/10.1287\/mnsc.1110.1470","journal-title":"Manage Sci"},{"key":"658_CR21","unstructured":"Ollion E, Shen R, Macanovic A, Chatelain A (2023) Chatgpt for Text Annotation? Mind the Hype! SocArXiv. October, 4. https:\/\/files.osf.io\/v1\/resources\/x58kn\/providers\/osfstorage\/651d60731bc8650a79f376cf?action=download&direct&version=1"},{"issue":"1","key":"658_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s42256-023-00783-6","volume":"6","author":"\u00c9 Ollion","year":"2024","unstructured":"Ollion \u00c9, Shen R, Macanovic A, Chatelain A (2024) The dangers of using proprietary LLMs for research. Nat Mach Intell 6(1):1. https:\/\/doi.org\/10.1038\/s42256-023-00783-6","journal-title":"Nat Mach Intell"},{"key":"658_CR23","unstructured":"Pawar S, Palshikar GK, Bhattacharyya P (2017) Relation Extraction: A Survey (arXiv:1712.05191). arXiv. http:\/\/arxiv.org\/abs\/1712.05191"},{"key":"658_CR24","doi-asserted-by":"crossref","unstructured":"Reiss MV (2023) Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark (arXiv:2304.11085). arXiv. http:\/\/arxiv.org\/abs\/2304.11085","DOI":"10.31219\/osf.io\/rvy5p"},{"issue":"5","key":"658_CR25","doi-asserted-by":"publisher","first-page":"1212","DOI":"10.1016\/j.cell.2014.10.050","volume":"159","author":"T Rolland","year":"2014","unstructured":"Rolland T, Ta\u015fan M, Charloteaux B, Pevzner SJ, Zhong Q, Sahni N, Yi S, Lemmens I, Fontanillo C, Mosca R, Kamburov A, Ghiassian SD, Yang X, Ghamsari L, Balcha D, Begg BE, Braun P, Brehme M, Broly MP, Vidal M (2014) A proteome-scale map of the human interactome network. Cell 159(5):1212\u20131226. https:\/\/doi.org\/10.1016\/j.cell.2014.10.050","journal-title":"Cell"},{"key":"658_CR26","unstructured":"Santolini M, Blondel L, Palmer MJ, Ward RN, Jeyaram R, Brink KR, Krishna A, Barabasi A-L (2023) iGEM: A model system for team science and innovation (arXiv:2310.19858). arXiv. http:\/\/arxiv.org\/abs\/2310.19858"},{"key":"658_CR27","doi-asserted-by":"publisher","unstructured":"Sauermann H, Haeussler C (2017) Authorship and contribution disclosures. Sci Advances 3(11):e1700404 https:\/\/doi.org\/10.1126\/sciadv.1700404","DOI":"10.1126\/sciadv.1700404"},{"key":"658_CR28","unstructured":"Shenoy V (2024) Varunshenoy\/GraphGPT [JavaScript]. https:\/\/github.com\/varunshenoy\/GraphGPT (2023)"},{"key":"658_CR29","doi-asserted-by":"crossref","unstructured":"Snow R, O\u2019Connor B, Jurafsky D, Ng A (2008) Cheap and Fast \u2013 But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In M. Lapata & H. T. Ng (Eds.), Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 254\u2013263). Association for Computational Linguistics. https:\/\/aclanthology.org\/D08\u20131027","DOI":"10.3115\/1613715.1613751"},{"issue":"6","key":"658_CR30","doi-asserted-by":"publisher","first-page":"1417","DOI":"10.1111\/1365-2656.12749","volume":"86","author":"C Song","year":"2017","unstructured":"Song C, Rohr RP, Saavedra S (2017) Why are some plant\u2013pollinator networks more nested than others? J Anim Ecol 86(6):1417\u20131424. https:\/\/doi.org\/10.1111\/1365-2656.12749","journal-title":"J Anim Ecol"},{"key":"658_CR31","unstructured":"T\u00f6rnberg P (2023) ChatGPT\u20134 outperforms experts and crowd workers in Annotating Political Twitter messages with zero-shot learning. arXiv. arXiv:2304.06588. http:\/\/arxiv.org\/abs\/2304.06588"},{"issue":"8","key":"658_CR32","doi-asserted-by":"publisher","first-page":"1584","DOI":"10.1016\/j.respol.2015.04.010","volume":"44","author":"JP Walsh","year":"2015","unstructured":"Walsh JP, Lee Y-N (2015) The bureaucratization of science. Res Policy 44(8):1584-1600. https:\/\/doi.org\/10.1016\/j.respol.2015.04.010","journal-title":"Res Policy"},{"key":"658_CR33","doi-asserted-by":"publisher","unstructured":"Xu F, Wu L, Evans J (2022) Flat teams drive scientific innovation. Proceedings of the National Academy of Sciences, 119(23), e2200927119. https:\/\/doi.org\/10.1073\/pnas.2200927119","DOI":"10.1073\/pnas.2200927119"}],"container-title":["Applied Network Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41109-024-00658-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41109-024-00658-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41109-024-00658-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,9]],"date-time":"2024-10-09T12:03:45Z","timestamp":1728475425000},"score":1,"resource":{"primary":{"URL":"https:\/\/appliednetsci.springeropen.com\/articles\/10.1007\/s41109-024-00658-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,9]]},"references-count":33,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["658"],"URL":"https:\/\/doi.org\/10.1007\/s41109-024-00658-8","relation":{},"ISSN":["2364-8228"],"issn-type":[{"value":"2364-8228","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,9]]},"assertion":[{"value":"30 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 August 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 October 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"64"}}