{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:01:35Z","timestamp":1772164895379,"version":"3.50.1"},"reference-count":63,"publisher":"MIT Press","issue":"3","license":[{"start":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T00:00:00Z","timestamp":1714953600000},"content-version":"vor","delay-in-days":126,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SFB 1270\/2: 299150580"],"award-info":[{"award-number":["SFB 1270\/2: 299150580"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Software is a central part of modern science, and knowledge of its use is crucial for the scientific community with respect to reproducibility and attribution of its developers. Several studies have investigated in-text mentions of software and its quality, while the quality of formal software citations has only been analyzed superficially. This study performs an in-depth evaluation of formal software citation based on a set of manually annotated software references. It examines which resources are cited for software usage, to what extent they allow proper identification of software and its specific version, how this information is made available by scientific publishers, and how well it is represented in large-scale bibliographic databases. The results show that software articles are the most cited resource for software, while direct software citations are better suited for identification of software versions. Moreover, we found current practices by both publishers and bibliographic databases to be unsuited to represent these direct software citations, hindering large-scale analyses such as assessing software impact. We argue that current practices for representing software citations\u2014the recommended way to cite software by current citation standards\u2014stand in the way of their adoption by the scientific community, and urge providers of bibliographic data to explicitly model scientific software.<\/jats:p>","DOI":"10.1162\/qss_a_00309","type":"journal-article","created":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T14:11:46Z","timestamp":1715004706000},"page":"637-667","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["A multilevel analysis of data quality for formal software citation"],"prefix":"10.1162","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4203-8851","authenticated-orcid":true,"given":"David","family":"Schindler","sequence":"first","affiliation":[{"name":"Institute of Communications Engineering, University of Rostock, Rostock, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7011-6796","authenticated-orcid":true,"given":"Tazin","family":"Hossain","sequence":"additional","affiliation":[{"name":"Institute of Communications Engineering, University of Rostock, Rostock, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7225-9992","authenticated-orcid":true,"given":"Sascha","family":"Spors","sequence":"additional","affiliation":[{"name":"Institute of Communications Engineering, University of Rostock, Rostock, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7925-3363","authenticated-orcid":true,"given":"Frank","family":"Kr\u00fcger","sequence":"additional","affiliation":[{"name":"Institute of Communications Engineering, University of Rostock, Rostock, Germany"},{"name":"Faculty of Engineering, Hochschule Wismar, University of Applied Sciences: Technology, Business and Design, Wismar, Germany"},{"name":"Department Knowledge, Culture & Transformation, University of Rostock, Rostock, Germany"}]}],"member":"281","published-online":{"date-parts":[[2024,8,1]]},"reference":[{"key":"2024093020305967600_bib1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.5960048","volume-title":"Quarto","author":"Allaire","year":"2023"},{"issue":"1","key":"2024093020305967600_bib2","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1162\/qss_a_00019","article-title":"Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies","volume":"1","author":"Baas","year":"2020","journal-title":"Quantitative Science Studies"},{"key":"2024093020305967600_bib3","volume-title":"magrittr: A forward-pipe operator for R","author":"Bache","year":"2022"},{"issue":"3","key":"2024093020305967600_bib4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1541880.1541883","article-title":"Methodologies for data quality assessment and improvement","volume":"41","author":"Batini","year":"2009","journal-title":"ACM Computing Surveys"},{"issue":"1","key":"2024093020305967600_bib5","doi-asserted-by":"publisher","first-page":"8","DOI":"10.3847\/1538-4365\/ab7be6","article-title":"Credit lost: Two decades of software citation in astronomy","volume":"249","author":"Bouquin","year":"2020","journal-title":"The Astrophysical Journal Supplement Series"},{"key":"2024093020305967600_bib6","volume-title":"ggalluvial: Alluvial plots in \u2018ggplot2\u2019","author":"Brunson","year":"2023"},{"key":"2024093020305967600_bib7","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1007\/s13278-018-0501-6","article-title":"Link prediction for interdisciplinary collaboration via co-authorship network","volume":"8","author":"Cho","year":"2018","journal-title":"Social Network Analysis and Mining"},{"key":"2024093020305967600_bib22","article-title":"Why do we need to compare research software, and how should we do it?","volume-title":"Proceedings of the 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)","author":"Chue Hong","year":"2016"},{"issue":"1","key":"2024093020305967600_bib8","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educational and Psychological Measurement"},{"key":"2024093020305967600_bib9","volume-title":"xtable: Export tables to LaTeX or HTML","author":"Dahl","year":"2019"},{"issue":"12","key":"2024093020305967600_bib10","doi-asserted-by":"publisher","first-page":"e239","DOI":"10.1371\/journal.pcbi.0030239","article-title":"The modular organization of domain structures: Insights into protein\u2013protein binding","volume":"3","author":"Del Sol","year":"2007","journal-title":"PLOS Computational Biology"},{"issue":"1","key":"2024093020305967600_bib11","doi-asserted-by":"publisher","first-page":"16","DOI":"10.2218\/ijdc.v15i1.698","article-title":"Curated archiving of research software artifacts: Lessons learned from the French open archive (HAL)","volume":"15","author":"Di Cosmo","year":"2020","journal-title":"International Journal of Digital Curation"},{"issue":"3","key":"2024093020305967600_bib12","doi-asserted-by":"publisher","first-page":"312","DOI":"10.1017\/pan.2018.12","article-title":"Gendered citation patterns across political science and social science methodology fields","volume":"26","author":"Dion","year":"2018","journal-title":"Political Analysis"},{"issue":"7","key":"2024093020305967600_bib13","doi-asserted-by":"publisher","first-page":"870","DOI":"10.1002\/asi.24454","article-title":"Softcite dataset: A dataset of software mentions in biomedical and economic research publications","volume":"72","author":"Du","year":"2021","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"2024093020305967600_bib14","doi-asserted-by":"publisher","first-page":"e1022","DOI":"10.7717\/peerj-cs.1022","article-title":"Understanding progress in software citation: A study of software citation in the CORD-19 corpus","volume":"8","author":"Du","year":"2022","journal-title":"PeerJ Computer Science"},{"key":"2024093020305967600_bib15","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1186\/s13326-015-0026-0","article-title":"Ambiguity and variability of database and software names in bioinformatics","volume":"6","author":"Duck","year":"2015","journal-title":"Journal of Biomedical Semantics"},{"issue":"6","key":"2024093020305967600_bib16","doi-asserted-by":"publisher","first-page":"e0157989","DOI":"10.1371\/journal.pone.0157989","article-title":"A survey of bioinformatics database and software usage through mining the literature","volume":"11","author":"Duck","year":"2016","journal-title":"PLOS ONE"},{"issue":"1\u20132","key":"2024093020305967600_bib17","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1016\/S0378-3758(99)00047-6","article-title":"Simultaneous confidence intervals for multinomial proportions","volume":"82","author":"Glaz","year":"1999","journal-title":"Journal of Statistical Planning and Inference"},{"issue":"5","key":"2024093020305967600_bib18","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/MIC.2014.88","article-title":"Better software, better research","volume":"18","author":"Goble","year":"2014","journal-title":"IEEE Internet Computing"},{"issue":"12","key":"2024093020305967600_bib19","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1145\/1610252.1610285","article-title":"Assessing open source software as a scholarly contribution","volume":"52","author":"Hafer","year":"2009","journal-title":"Communications of the ACM"},{"key":"2024093020305967600_bib20","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1007\/978-3-319-09785-5_8","article-title":"The use of bibliometrics for assessing research: Possibilities, limitations and adverse effects","volume-title":"Incentives and performance","author":"Haustein","year":"2014"},{"issue":"1","key":"2024093020305967600_bib21","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1162\/qss_a_00022","article-title":"Crossref: The sustainable source of community-owned scholarly metadata","volume":"1","author":"Hendricks","year":"2020","journal-title":"Quantitative Science Studies"},{"issue":"9","key":"2024093020305967600_bib23","doi-asserted-by":"publisher","first-page":"2137","DOI":"10.1002\/asi.23538","article-title":"Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature","volume":"67","author":"Howison","year":"2016","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"2024093020305967600_bib24","doi-asserted-by":"publisher","first-page":"155","DOI":"10.5281\/zenodo.8305981","article-title":"A large dataset of software mentions in the biomedical literature","volume-title":"Proceedings of the 19th International Conference of the International Society for Scientometrics and Informetrics","author":"Istrate","year":"2023"},{"key":"2024093020305967600_bib25","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.12688\/f1000research.26932.2","article-title":"Recognizing the value of software: A software citation guide","volume":"9","author":"Katz","year":"2021","journal-title":"F1000Research"},{"issue":"1","key":"2024093020305967600_bib26","doi-asserted-by":"publisher","first-page":"e7","DOI":"10.5334\/jors.by","article-title":"Transitive credit and JSON-LD","volume":"3","author":"Katz","year":"2015","journal-title":"Journal of Open Research Software"},{"key":"2024093020305967600_bib27","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.10140","article-title":"The Semantic Scholar open data platform","author":"Kinney","year":"2023","journal-title":"arXiv"},{"key":"2024093020305967600_bib28","volume-title":"easyalluvial: Generate alluvial plots with a single line of code","author":"Koneswarakantha","year":"2022"},{"key":"2024093020305967600_bib29","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1007\/978-3-031-27077-2_37","article-title":"Link-rot in web-sourced multimedia datasets","volume-title":"International conference on multimedia modeling","author":"Lakic","year":"2023"},{"issue":"2","key":"2024093020305967600_bib30","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1629\/uksg.233","article-title":"CrossRef text and data mining services","volume":"28","author":"Lammey","year":"2015","journal-title":"Insights"},{"issue":"1","key":"2024093020305967600_bib31","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"issue":"10","key":"2024093020305967600_bib32","doi-asserted-by":"publisher","first-page":"e0164461","DOI":"10.1371\/journal.pone.0164461","article-title":"Responses of bovine innate immunity to Mycobacterium avium subsp. paratuberculosis infection revealed by changes in gene expression and levels of microRNA","volume":"11","author":"Malvisi","year":"2016","journal-title":"PLOS ONE"},{"key":"2024093020305967600_bib33","volume-title":"rcompanion: Functions to support extension education program evaluation","author":"Mangiafico","year":"2023"},{"issue":"9","key":"2024093020305967600_bib34","doi-asserted-by":"publisher","first-page":"e0183806","DOI":"10.1371\/journal.pone.0183806","article-title":"Pathological grooming: Evidence for a single factor behind trichotillomania, skin picking and nail biting","volume":"12","author":"Maraz","year":"2017","journal-title":"PLOS ONE"},{"issue":"6","key":"2024093020305967600_bib35","doi-asserted-by":"publisher","first-page":"1341","DOI":"10.1002\/asi.23721","article-title":"Assessing and tracing the outcomes and impact of research infrastructures","volume":"68","author":"Mayernik","year":"2017","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"2024093020305967600_bib36","doi-asserted-by":"publisher","first-page":"486","DOI":"10.1109\/eScience.2017.78","article-title":"Understanding software in research: Initial results from examining nature and a call for collaboration","volume-title":"2017 IEEE 13th International Conference on e-Science (e-Science)","author":"Nangia","year":"2017"},{"issue":"1","key":"2024093020305967600_bib37","doi-asserted-by":"publisher","first-page":"bbab456","DOI":"10.1093\/bib\/bbab456","article-title":"Impact of computational approaches in the fight against COVID-19: An AI guided review of 17 000 studies","volume":"23","author":"Napolitano","year":"2022","journal-title":"Briefings in Bioinformatics"},{"issue":"4","key":"2024093020305967600_bib38","doi-asserted-by":"publisher","first-page":"860","DOI":"10.1016\/j.joi.2015.07.012","article-title":"Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers","volume":"9","author":"Pan","year":"2015","journal-title":"Journal of Informetrics"},{"issue":"11","key":"2024093020305967600_bib39","doi-asserted-by":"publisher","first-page":"e1001217","DOI":"10.1371\/journal.pgen.1001217","article-title":"Genome-wide association meta-analysis of cortical bone mineral density unravels allelic heterogeneity at the RANKL locus and potential pleiotropic effects on bone","volume":"6","author":"Paternoster","year":"2010","journal-title":"PLOS Genetics"},{"key":"2024093020305967600_bib40","volume-title":"patchwork: The composer of plots","author":"Pedersen","year":"2022"},{"issue":"1","key":"2024093020305967600_bib41","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1007\/s11192-020-03397-6","article-title":"The practice of self-citations: A longitudinal study","volume":"123","author":"Peroni","year":"2020","journal-title":"Scientometrics"},{"key":"2024093020305967600_bib42","volume-title":"Rstudio: Integrated development environment for R","author":"Posit Team","year":"2023"},{"key":"2024093020305967600_bib43","volume-title":"R: A language and environment for statistical computing","author":"R Core Team","year":"2023"},{"key":"2024093020305967600_bib44","volume-title":"articlenizer","author":"Schindler","year":"2021"},{"key":"2024093020305967600_bib45","doi-asserted-by":"publisher","first-page":"4574","DOI":"10.1145\/3459637.3482017","article-title":"SoMeSci\u2014A 5 star open data gold standard knowledge graph of software mentions in scientific articles","volume-title":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","author":"Schindler","year":"2021"},{"key":"2024093020305967600_bib46","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.4968738","article-title":"SoMeSci","author":"Schindler","year":"2021","journal-title":"Zenodo"},{"key":"2024093020305967600_bib47","doi-asserted-by":"publisher","first-page":"e835","DOI":"10.7717\/peerj-cs.835","article-title":"The role of software in science: A knowledge graph-based analysis of software mentions in PubMed Central","volume":"8","author":"Schindler","year":"2022","journal-title":"PeerJ Computer Science"},{"key":"2024093020305967600_bib48","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.10815753","volume-title":"SoMeSci_Citation","author":"Schindler","year":"2024"},{"key":"2024093020305967600_bib49","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1007\/978-3-030-49461-2_16","article-title":"Investigating software usage in the social sciences: A knowledge graph approach","volume-title":"The semantic web","author":"Schindler","year":"2020"},{"key":"2024093020305967600_bib50","volume-title":"DescTools: Tools for descriptive statistics","author":"Signorell","year":"2023"},{"issue":"429","key":"2024093020305967600_bib51","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1080\/01621459.1995.10476521","article-title":"Simultaneous confidence intervals and sample size determination for multinomial proportions","volume":"90","author":"Sison","year":"1995","journal-title":"Journal of the American Statistical Association"},{"key":"2024093020305967600_bib52","doi-asserted-by":"publisher","first-page":"e86","DOI":"10.7717\/peerj-cs.86","article-title":"Software citation principles","volume":"2","author":"Smith","year":"2016","journal-title":"PeerJ Computer Science"},{"issue":"2","key":"2024093020305967600_bib53","doi-asserted-by":"publisher","DOI":"10.2218\/ijdc.v11i2.390","article-title":"Citations for software: Providing identification, access and recognition for research software","volume":"11","author":"Soito","year":"2016","journal-title":"International Journal of Digital Curation"},{"issue":"1","key":"2024093020305967600_bib54","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1038\/s41597-023-02491-7","article-title":"Journal production guidance for software and data citations","volume":"10","author":"Stall","year":"2023","journal-title":"Scientific Data"},{"key":"2024093020305967600_bib55","first-page":"102","article-title":"BRAT: A web-based tool for NLP-assisted text annotation","volume-title":"Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics","author":"Stenetorp","year":"2012"},{"issue":"7524","key":"2024093020305967600_bib56","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1038\/514550a","article-title":"The top 100 papers","volume":"514","author":"Van Noorden","year":"2014","journal-title":"Nature"},{"key":"2024093020305967600_bib57","volume-title":"Python","author":"Van Rossum","year":"2022"},{"key":"2024093020305967600_bib58","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1145\/3487553.3527147","article-title":"The Semantic Scholar Academic Graph (S2AG)","volume-title":"Companion Proceedings of the Web Conference 2022","author":"Wade","year":"2022"},{"issue":"3","key":"2024093020305967600_bib59","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1080\/09296174.2013.799918","article-title":"Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods","volume":"20","author":"Wallis","year":"2013","journal-title":"Journal of Quantitative Linguistics"},{"issue":"43","key":"2024093020305967600_bib60","doi-asserted-by":"publisher","first-page":"1686","DOI":"10.21105\/joss.01686","article-title":"Welcome to the tidyverse","volume":"4","author":"Wickham","year":"2019","journal-title":"Journal of Open Source Software"},{"key":"2024093020305967600_bib61","volume-title":"tidyverse","author":"Wickham","year":"2021"},{"issue":"8","key":"2024093020305967600_bib62","doi-asserted-by":"publisher","first-page":"e0183811","DOI":"10.1371\/journal.pone.0183811","article-title":"Towards a social functional account of laughter: Acoustic features convey reward, affiliation, and dominance","volume":"12","author":"Wood","year":"2017","journal-title":"PLOS ONE"},{"issue":"9","key":"2024093020305967600_bib63","doi-asserted-by":"publisher","first-page":"104846","DOI":"10.1016\/j.respol.2023.104846","article-title":"Open source software and global entrepreneurship","volume":"52","author":"Wright","year":"2023","journal-title":"Research Policy"}],"container-title":["Quantitative Science Studies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/qss\/article-pdf\/5\/3\/637\/2472775\/qss_a_00309.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/qss\/article-pdf\/5\/3\/637\/2472775\/qss_a_00309.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T16:31:45Z","timestamp":1727713905000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/qss\/article\/5\/3\/637\/120941\/A-multilevel-analysis-of-data-quality-for-formal"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":63,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,8,1]]}},"URL":"https:\/\/doi.org\/10.1162\/qss_a_00309","relation":{"has-review":[{"id-type":"doi","id":"10.1162\/QSS_A_00309\/v1\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00309\/v2\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00309\/v2\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00309\/v1\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00309\/v1\/review2","asserted-by":"object"},{"id-type":"doi","id":"10.1162\/QSS_A_00309\/v2\/review1","asserted-by":"object"}]},"ISSN":["2641-3337"],"issn-type":[{"value":"2641-3337","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}