{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T18:15:15Z","timestamp":1754158515195,"version":"3.41.2"},"reference-count":37,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["EL"],"published-print":{"date-parts":[[2021,11,4]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>This paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging [MARC 21] format. Possible solutions are presented.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>This mixed method study relied on content analysis and social network analysis. The study examined subject representation in MARC 21 metadata records created in 2020 in WorldCat \u2013 the largest international database of \u201cbig smart data.\u201d The methodological challenges that were encountered and solutions are examined.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>In this general review paper with a focus on methodological issues, the discussion of challenges is followed by a discussion of solutions developed and tested as part of this study. Data collection, processing, analysis and visualization are addressed separately. Lessons learned and conclusions related to challenges and solutions for the design of a large-scale study evaluating MARC 21 bibliographic metadata from WorldCat are given. Overall recommendations for the design and implementation of future research are suggested.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>There are no previous publications that address the challenges and solutions of data collection and analysis of WorldCat\u2019s \u201cbig smart data\u201d in the form of MARC 21 data. This is the first study to use a large data set to systematically examine MARC 21 library metadata records created after the most recent addition of new fields and subfields to MARC 21 Bibliographic Format standard in 2019 based on resource description and access rules. It is also the first to focus its analyzes on the networks formed by subject terms shared by MARC 21 bibliographic records in a data set extracted from a heterogeneous centralized database WorldCat.<\/jats:p><\/jats:sec>","DOI":"10.1108\/el-11-2020-0316","type":"journal-article","created":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T06:33:17Z","timestamp":1628663597000},"page":"486-503","source":"Crossref","is-referenced-by-count":0,"title":["Collecting and evaluating large volumes of bibliographic metadata aggregated in the WorldCat database: a proposed methodology to overcome challenges"],"prefix":"10.1108","volume":"39","author":[{"given":"Vyacheslav I.","family":"Zavalin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shawne D.","family":"Miksa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","published-online":{"date-parts":[[2021,8,9]]},"reference":[{"volume-title":"Big Data, Little Data, No Data: Scholarship in the Networked World","year":"2015","key":"key2021110310070702600_ref001"},{"issue":"1","key":"key2021110310070702600_ref002","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/asi.5090190103","article-title":"Information science: what is it?","volume":"19","year":"1968","journal-title":"American Documentation"},{"key":"key2021110310070702600_ref003","first-page":"36","article-title":"Big data, big metadata and quantitative study of science: a workflow model for big scientometrics","volume":"54","year":"2017","journal-title":"Proceedings of the 80th ASISandT Annual Meeting"},{"issue":"1\/2","key":"key2021110310070702600_ref004","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1108\/LR-09-2014-0097","article-title":"Understanding semantic web: a conceptual model","volume":"64","year":"2015","journal-title":"Library Review"},{"volume-title":"Looking for Information: A Survey of Research on Information Seeking, Needs and Behavior","year":"2012","key":"key2021110310070702600_ref005"},{"issue":"5","key":"key2021110310070702600_ref006","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1108\/09696470510700394","article-title":"Semantic networks and social networks","volume":"12","year":"2005","journal-title":"The Learning Organization"},{"article-title":"Motif simplification: improving network visualization readability with fan, connector, and clique glyphs","volume-title":"in ACM SIGCHI Conference on Human Factors in Computing Systems, 27 April-2 May, Paris, France: www.cs.umd.edu\/hcil\/trs\/2012-29\/2012-29.pdf","year":"2013","key":"key2021110310070702600_ref007"},{"issue":"3","key":"key2021110310070702600_ref008","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1515\/jdis-2017-0012","article-title":"Big metadata, smart metadata, and metadata capital","volume":"2","year":"2017","journal-title":"Journal of Data and Information Science"},{"issue":"3","key":"key2021110310070702600_ref009","doi-asserted-by":"crossref","first-page":"212","DOI":"10.5860\/crl.66.3.212","article-title":"What have we got to lose? The effect of controlled vocabulary on keyword searching results","volume":"66","year":"2005","journal-title":"College and Research Libraries"},{"issue":"1","key":"key2021110310070702600_ref010","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/01639374.2014.917447","article-title":"Still a lot to lose: the role of controlled vocabulary in keyword searching","volume":"53","year":"2015","journal-title":"Cataloging and Classification Quarterly"},{"key":"key2021110310070702600_ref011","first-page":"1","article-title":"A fast multi-scale method for drawing large graphs","volume-title":"Graph Drawing: GD 2000 (Lecture Notes in Computer Science Series, Vol. 1984)","year":"2001"},{"volume-title":"Knowledge Organization","year":"2018","key":"key2021110310070702600_ref012"},{"key":"key2021110310070702600_ref013","unstructured":"IFLA (2019), \u201cBest practice for national bibliographic agencies in a digital age\u201d, available at: www.ifla.org\/node\/7858"},{"issue":"3","key":"key2021110310070702600_ref014","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1177\/001316447003000308","article-title":"Determining sample size for research activities","volume":"30","year":"1970","journal-title":"Educational and Psychological Measurement"},{"key":"key2021110310070702600_ref015","unstructured":"MarcEdit Development (2013), \u201cAbout MarcEdit\u201d, available at: https:\/\/MarcEdit.reeset.net\/about-MarcEdit"},{"key":"key2021110310070702600_ref016","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.enpol.2015.06.025","article-title":"A network analysis using metadata to investigate innovation in clean-tech: implications for energy policy","volume":"86","year":"2015","journal-title":"Energy Policy"},{"key":"key2021110310070702600_ref017","unstructured":"Mierswa, I. and Klinkenberg, R. (2020), \u201cRapidMiner studio (version 9.6) [computer software]\u201d, available at: https:\/\/rapidminer.com"},{"volume-title":"The Networked Catalog","year":"2014","key":"key2021110310070702600_ref018"},{"key":"key2021110310070702600_ref019","unstructured":"NISO (2021), \u201cZ39.50: a primer on the protocol in NISO\u201d, available at: www.niso.org\/publications\/z3950-primer-protocol"},{"key":"key2021110310070702600_ref020","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/pra2.216","article-title":"Diffusion and adoption of linked data among libraries","volume-title":"in Proceedings of 83rd Annual Meeting of the Association for Information Science and Technology, 25-29 October, 57: e216","year":"2020"},{"key":"key2021110310070702600_ref021","unstructured":"OCLC (2021), \u201cInside WorldCat in OCLC\u201d, available at: www.oclc.org\/en\/worldcat\/inside-worldcat.html"},{"issue":"3","key":"key2021110310070702600_ref022","doi-asserted-by":"crossref","first-page":"22","DOI":"10.6017\/ital.v34i3.5889","article-title":"Evaluation of semi-automatic metadata generation tools: a survey of the current state of the art","volume":"34","year":"2015","journal-title":"Information Technology and Libraries"},{"key":"key2021110310070702600_ref023","unstructured":"Phillips, M.E. (2020), \u201cExploring the use of metadata record graphs for metadata assessment\u201d, Doctoral dissertation, University of North Texas, Denton, TX."},{"issue":"2","key":"key2021110310070702600_ref024","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1504\/IJMSO.2020.108326","article-title":"Exploring the utility of metadata record graphs and network analysis for metadata quality evaluation and augmentation","volume":"14","year":"2020","journal-title":"International Journal of Metadata, Semantics and Ontologies"},{"volume-title":"Social Network Analysis: History, Theory and Methodology","year":"2012","key":"key2021110310070702600_ref025"},{"key":"key2021110310070702600_ref026","unstructured":"Salo, D. (2017), \u201cRetooling libraries for the data challenge\u201d, Minds @ UW, available at: https:\/\/minds.wisconsin.edu\/bitstream\/handle\/1793\/46142\/DataChallenge.pdf"},{"key":"key2021110310070702600_ref027","first-page":"2570","article-title":"Information science","volume-title":"Encyclopedia of Library and Information Science","year":"2009"},{"issue":"3","key":"key2021110310070702600_ref028","first-page":"2","article-title":"Big? Smart? Clean? Messy? Data in the humanities","volume":"2","year":"2013","journal-title":"Journal for Digital Humanities"},{"key":"key2021110310070702600_ref029","first-page":"223","article-title":"Is quality metadata shareable metadata? The implications of local metadata practices for federated collections","volume-title":"Proceedings of the 12th National Conference of the Association of College and Research Libraries","year":"2005"},{"key":"key2021110310070702600_ref030","unstructured":"Smith, M., Ceni, A., Milic-Frayling, N., Shneiderman, B., Mendes Rodrigues, E., Leskovec, J. and Dunne, C. (2010), \u201cNodeXL: a free and open network overview, discovery and exploration add-in for excel 2007\/2010\/2013\/2016\u201d, Social Media Research Foundation, available at: www.smrfoundation.org"},{"key":"key2021110310070702600_ref031","first-page":"30","article-title":"An exploratory analysis of subject metadata in the digital public library of america","volume-title":"Metadata and Ubiquitous Access to Culture, Science, and Digital Humanities: Proceedings of the International Conference and Workshop on Dublin Core and Metadata Applications, S\u00e3o Paulo, Brazil","year":"2015"},{"key":"key2021110310070702600_ref032","unstructured":"Topham, K. (2018), \u201cOf python and pandas: using programming to improve discovery and access\u201d, available at: https:\/\/saaers.wordpress.com\/2018\/10\/09\/of-python-and-pandas-using-programming-to-improve-discovery-and-access\/"},{"issue":"4","key":"key2021110310070702600_ref033","doi-asserted-by":"crossref","first-page":"674","DOI":"10.2307\/2096399","article-title":"The sources and consequences of embeddedness for the economic performance of organizations: the network effect","volume":"61","year":"1996","journal-title":"American Sociological Review"},{"issue":"3","key":"key2021110310070702600_ref034","doi-asserted-by":"crossref","first-page":"319","DOI":"10.3233\/ISU-170853","article-title":"Advancing library cyberinfrastructure for big data sharing and reuse","volume":"37","year":"2017","journal-title":"Information Services and Use"},{"key":"key2021110310070702600_ref035","unstructured":"Zavalin, V.I. (2020), \u201cExploration of RDA-based MARC 21 subject metadata in WorldCat database and its readiness to support linked data functionality\u201d, Doctoral dissertation, University of North Texas, Denton, TX."},{"article-title":"Subject access, smart data, and digital humanities: finding unlimited opportunities through their intersections","volume-title":"Presented at IFLA Classification and Indexing Satellite Conference, 11-12 August, Columbus, OH","year":"2016","key":"key2021110310070702600_ref036"},{"issue":"3","key":"key2021110310070702600_ref037","doi-asserted-by":"crossref","first-page":"233","DOI":"10.3233\/SW-130117","article-title":"Linked data, big data, and the 4th paradigm","volume":"4","year":"2013","journal-title":"Semantic Web"}],"container-title":["The Electronic Library"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/EL-11-2020-0316\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/EL-11-2020-0316\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T01:08:24Z","timestamp":1753405704000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/el\/article\/39\/3\/486-503\/99135"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":37,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,8,9]]},"published-print":{"date-parts":[[2021,11,4]]}},"alternative-id":["10.1108\/EL-11-2020-0316"],"URL":"https:\/\/doi.org\/10.1108\/el-11-2020-0316","relation":{},"ISSN":["0264-0473","0264-0473"],"issn-type":[{"type":"print","value":"0264-0473"},{"type":"print","value":"0264-0473"}],"subject":[],"published":{"date-parts":[[2021,8,9]]}}}