{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:43:49Z","timestamp":1777455829674,"version":"3.51.4"},"reference-count":50,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000155","name":"Social Sciences and Humanities Research Council of Canada","doi-asserted-by":"publisher","award":["Canada Graduate Scholarship 767-2015-2217"],"award-info":[{"award-number":["Canada Graduate Scholarship 767-2015-2217"]}],"id":[{"id":"10.13039\/501100000155","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000155","name":"Social Sciences and Humanities Research Council of Canada","doi-asserted-by":"publisher","award":["Michael Smith Foreign Study Supplement"],"award-info":[{"award-number":["Michael Smith Foreign Study Supplement"]}],"id":[{"id":"10.13039\/501100000155","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Big Data &amp; Society"],"published-print":{"date-parts":[[2023,1]]},"abstract":"<jats:p>This paper examines the Web ARChive (WARC) file format, revealing how the format has come to play a central role in the development and standardization of interoperable tools and methods for the international web archiving community. In the context of emerging big data approaches, I consider the sociotechnical relationships between material construction of data and information infrastructures for collecting and research. Analysis is inspired by Star and Griesemer's historical case of the Museum of Vertebrate Zoology which reveals how boundary objects and methods standardization are used to enroll actors in the work of collecting for natural history. I extend these concepts by pairing them with frameworks for studying digital materiality and the representational qualities of data artifacts. Through examples drawn from fieldwork observations studying two data-centered research projects, I consider how the materiality of the WARC format influences research methods and approaches to data extraction, selection, and transformation. Findings identify three modalities researchers use to configure WARC data for researcher needs: using indexes to support search queries, constructing derivative formats designed for certain types of analysis, and generating custom-designed datasets tailored for specific research purposes. Findings additionally reveal similarities in how these distinct methods approach automated data extraction by relying upon the WARC's standardized metadata elements. By interrogating whose information needs are being met and taken into account in the design of the WARC's underlying information representation, I reveal effects on the emerging field of web history, and consider alternative approaches to knowledge production with archived web data.<\/jats:p>","DOI":"10.1177\/20539517231163172","type":"journal-article","created":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T02:24:30Z","timestamp":1678760670000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":8,"title":["All WARC and no playback: The materialities of data-centered web archives research"],"prefix":"10.1177","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9329-7995","authenticated-orcid":false,"given":"Emily","family":"Maemura","sequence":"first","affiliation":[{"name":"School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA"}]}],"member":"179","published-online":{"date-parts":[[2023,3,13]]},"reference":[{"key":"bibr1-20539517231163172","unstructured":"Bailey J (2020) Archive-It and Archives Unleashed Join Forces to Scale Research Use of Web Archives. Available at: https:\/\/blog.archive.org\/2020\/07\/28\/archive-it-and-archives-unleashed-join-forces-to-scale-research-use-of-web-archives\/"},{"key":"bibr2-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1177\/2053951716654502"},{"key":"bibr3-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1080\/24701475.2018.1455412"},{"key":"bibr4-20539517231163172","doi-asserted-by":"publisher","DOI":"10.7227\/ALX.0022"},{"key":"bibr5-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21542"},{"key":"bibr6-20539517231163172","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/6352.001.0001"},{"key":"bibr7-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1177\/1461444812462852"},{"issue":"3","key":"bibr8-20539517231163172","volume":"25","author":"Br\u00fcgger N","year":"2020","journal-title":"First Monday"},{"key":"bibr9-20539517231163172","unstructured":"Burner M, Kahle B (1996) ARC File Format Reference. Available at: http:\/\/archive.org\/web\/researcher\/ArcFileFormat.php (accessed 15 February 2023)."},{"key":"bibr10-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1007\/s00799-016-0171-9"},{"key":"bibr11-20539517231163172","unstructured":"Dewey C (2014) How Web archivists and other digital sleuths are unraveling the mystery of MH17.\n                      Washington Post\n                      , 21 July. Available at: https:\/\/www.washingtonpost.com\/news\/the-intersect\/wp\/2014\/07\/21\/how-web-archivists-and-other-digital-sleuths-are-unraveling-the-mystery-of-mh17\/"},{"key":"bibr12-20539517231163172","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/10999.001.0001"},{"key":"bibr13-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1177\/0306312711413314"},{"issue":"5","key":"bibr14-20539517231163172","first-page":"181","volume":"78","author":"Eltgroth DR","year":"2009","journal-title":"Fordham Law Review"},{"key":"bibr15-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1145\/1016978.1016993"},{"key":"bibr16-20539517231163172","unstructured":"Fielding R, Gettys J, Mogul J, et al. (1999) RFC 2616 Hypertext Transfer Protocol. Available at: https:\/\/www.ietf.org\/rfc\/rfc2616.txt (accessed 15 February 2023)."},{"key":"bibr17-20539517231163172","doi-asserted-by":"publisher","DOI":"10.17265\/2159-5313\/2016.09.003"},{"key":"bibr18-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1080\/24701475.2022.2103988"},{"key":"bibr19-20539517231163172","first-page":"139","volume-title":"Web 25","author":"Helmond A","year":"2017"},{"key":"bibr20-20539517231163172","doi-asserted-by":"publisher","DOI":"10.7227\/ALX.0023"},{"key":"bibr21-20539517231163172","unstructured":"IIPC (n.d.) About IIPC. Available at: https:\/\/netpreserve.org\/about-us\/ (accessed 28 March 2022)."},{"key":"bibr22-20539517231163172","unstructured":"Internet Archive (n.d.) About the Internet Archive. Available at: https:\/\/archive.org\/about\/ (accessed 28 March 2022)."},{"key":"bibr23-20539517231163172","unstructured":"ISO 28500:2009 (2009) Information and documentation\u2014WARC file format."},{"key":"bibr24-20539517231163172","unstructured":"ISO 28500:2017 (2017) Information and documentation\u2014WARC file format."},{"key":"bibr25-20539517231163172","volume-title":"Mechanisms: New Media and the Forensic Imagination","author":"Kirschenbaum MG","year":"2012"},{"key":"bibr26-20539517231163172","unstructured":"Kunze J (2005) WARC: an Archiving Format for the Web. In: International Web Archiving Workshop 2005. Available at: https:\/\/web.archive.org\/web\/20120619151338\/http:\/\/www.iwaw.net\/05\/kunze.pdf"},{"key":"bibr27-20539517231163172","doi-asserted-by":"publisher","DOI":"10.5210\/fm.v15i6.3036"},{"key":"bibr28-20539517231163172","first-page":"1","volume":"22","author":"Lischer-Katz Z","year":"2017","journal-title":"First Monday"},{"key":"bibr29-20539517231163172","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/11543.001.0001"},{"key":"bibr30-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1109\/JCDL.2019.00044"},{"key":"bibr31-20539517231163172","unstructured":"Mohr G, Stack M, Ranitovic I, et al. (2004) An Introduction to Heritrix: An open source archival quality web crawler. In: International Web Archiving Workshop 2004. Available at: https:\/\/web.archive.org\/web\/20170809135759\/http:\/\/iwaw.europarchive.org\/04\/Mohr.pdf"},{"issue":"1","key":"bibr32-20539517231163172","first-page":"1","volume":"6","author":"Ogden J","year":"2021","journal-title":"Internet Histories"},{"key":"bibr33-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1007\/s42803-021-00032-5"},{"key":"bibr34-20539517231163172","unstructured":"Padilla T (2017)\n                      On a Collections as Data Imperative\n                      . Available at: http:\/\/digitalpreservation.gov\/meetings\/dcs16\/tpadilla_OnaCollectionsasDataImperative_final.pdf (accessed 15 February 2023)."},{"key":"bibr35-20539517231163172","doi-asserted-by":"crossref","first-page":"147","DOI":"10.7551\/mitpress\/9302.003.0010","volume-title":"\u2018Raw Data\u2019 Is an Oxymoron","author":"Ribes D","year":"2013"},{"key":"bibr36-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1080\/1369118X.2020.1766534"},{"key":"bibr37-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1080\/24701475.2017.1307542"},{"key":"bibr38-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1007\/s42803-020-00029-6"},{"key":"bibr39-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1080\/23257962.2022.2100336"},{"key":"bibr40-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1108\/JD-11-2020-0195"},{"key":"bibr41-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1177\/00027649921955326"},{"key":"bibr42-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1177\/0162243910377624"},{"key":"bibr43-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1177\/030631289019003001"},{"key":"bibr44-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1145\/2998181.2998345"},{"key":"bibr45-20539517231163172","unstructured":"The Archives Unleashed Project (n.d.) About the Archives Unleashed Project. Available at: https:\/\/archivesunleashed.org\/about-project\/ (accessed 28 March 2022)"},{"key":"bibr46-20539517231163172","unstructured":"Thomas A, Meyer ET, Dougherty M, et al. (2010)\n                      Researcher Engagement with Web Archives: Challenges and Opportunities for Investment\n                      . Available at: https:\/\/papers.ssrn.com\/abstract=1715000."},{"key":"bibr47-20539517231163172","volume-title":"The Politics of Mass Digitization","author":"Thylstrup NB","year":"2018"},{"key":"bibr48-20539517231163172","unstructured":"Tofel B (2007) \u2018Wayback\u2019 for accessing web archives. In: International Web Archiving Workshop 2007. Available at: https:\/\/web.archive.org\/web\/20220119160520\/http:\/\/www-poleia.lip6.fr\/\u223cdoucet\/GRBD\/IWAW2007_tofel.pdf"},{"key":"bibr49-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1080\/19312458.2018.1447657"},{"key":"bibr50-20539517231163172","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqac050"}],"container-title":["Big Data &amp; Society"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/20539517231163172","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/20539517231163172","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/20539517231163172","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T13:00:21Z","timestamp":1777381221000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/20539517231163172"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1]]}},"alternative-id":["10.1177\/20539517231163172"],"URL":"https:\/\/doi.org\/10.1177\/20539517231163172","relation":{},"ISSN":["2053-9517","2053-9517"],"issn-type":[{"value":"2053-9517","type":"print"},{"value":"2053-9517","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1]]},"article-number":"20539517231163172"}}