{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:19:42Z","timestamp":1766067582763,"version":"3.41.0"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,9,16]],"date-time":"2022-09-16T00:00:00Z","timestamp":1663286400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Comput. Cult. Herit."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>This article reports on a study using machine learning to identify incidences and shifting dynamics of hate speech in social media archives. To better cope with the archival processing need for such large-scale and fast evolving archives, we propose the Data-driven and Circulating Archival Processing (DCAP) method. As a proof-of-concept, our study focuses on an English language Twitter archive relating to COVID-19: Tweets were repeatedly scraped between February and June 2020, ingested and aggregated within the COVID-19 Hate Speech Twitter Archive (CHSTA), and analyzed for hate speech using the Generative Adversarial Network\u2013inspired DCAP method. Outcomes suggest that it is possible to use machine learning and data analytics to surface and substantiate trends from CHSTA and similar social media archives that could provide immediately useful knowledge for crisis response, in controversial situations, or for public policy development, as well as for subsequent historical analysis. The approach shows potential for integrating multiple aspects of the archival workflow and supporting automatic iterative redescription and reappraisal activities in ways that make them more accountable and more rapidly responsive to changing societal interests and unfolding developments.<\/jats:p>","DOI":"10.1145\/3547146","type":"journal-article","created":{"date-parts":[[2022,7,8]],"date-time":"2022-07-08T09:02:20Z","timestamp":1657270940000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Using Machine Learning to Enhance Archival Processing of Social Media Archives"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7962-9113","authenticated-orcid":false,"given":"Lizhou","family":"Fan","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4561-9566","authenticated-orcid":false,"given":"Zhanyuan","family":"Yin","sequence":"additional","affiliation":[{"name":"The University of Chicago, Chicago, IL"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3776-9211","authenticated-orcid":false,"given":"Huizi","family":"Yu","sequence":"additional","affiliation":[{"name":"Brown University, Providence, RI"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4897-7780","authenticated-orcid":false,"given":"Anne J.","family":"Gilliland","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, CA"}]}],"member":"320","published-online":{"date-parts":[[2022,9,16]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_2_2_2","DOI":"10.1007\/s10502-019-09325-9"},{"doi-asserted-by":"publisher","key":"e_1_3_2_3_2","DOI":"10.21428\/62b3421f"},{"doi-asserted-by":"publisher","key":"e_1_3_2_4_2","DOI":"10.1177\/1461444816676645"},{"volume-title":"Introducing Cultural Heritage Informatics into the Curriculum of LIS Education","unstructured":"Association for Information Science and Technology (ASIS&T) [n.d.]. Introducing Cultural Heritage Informatics into the Curriculum of LIS Education. Retrieved October 26, 2020 from https:\/\/www.asist.org\/meetings-events\/webinars\/introducing-cultural-heritage-informatics-into-the-curriculum-of-lis-education\/.","key":"e_1_3_2_5_2"},{"issue":"3","key":"e_1_3_2_6_2","first-page":"671","article-title":"Big data\u2019s disparate impact","volume":"104","author":"Barocas Solon","year":"2016","unstructured":"Solon Barocas and Andrew D. Selbst. 2016. Big data\u2019s disparate impact. Calif. Law Rev. 104, 3 (2016), 671\u2013732. http:\/\/www.jstor.org\/stable\/24758720.","journal-title":"Calif. Law Rev."},{"key":"e_1_3_2_7_2","volume-title":"Race after Technology: Abolitionist Tools for the New Jim Code","author":"Benjamin Ruha","year":"2019","unstructured":"Ruha Benjamin. 2019. Race after Technology: Abolitionist Tools for the New Jim Code. Polity, Cambridge, UK."},{"doi-asserted-by":"crossref","unstructured":"Kathleen Margaret Brennan. 2019. Believe me: Authenticity federal social media use and the problematized record in the American digital public sphere. Crit. Libr. Inf. Stud. 2 2 (2019).","key":"e_1_3_2_8_2","DOI":"10.24242\/jclis.v2i2.72"},{"unstructured":"Axel Bruns. 2018. The Library of Congress Twitter Archive: A Failure of Historic Proportions. Retrieved from https:\/\/medium.com\/dmrc-at-large\/the-library-of-congress-twitter-archive-a-failure-of-historic-proportions-6dc1c3bc9e2c.","key":"e_1_3_2_9_2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_10_2","DOI":"10.1093\/oso\/9780190493028.001.0001"},{"doi-asserted-by":"publisher","key":"e_1_3_2_11_2","DOI":"10.1140\/epjds\/s13688-016-0072-6"},{"doi-asserted-by":"crossref","unstructured":"Michelle Caswell and Marika Cifor. 2019. Neither a beginning nor an end: Applying an ethics of care to digital archival collections. In the Routledge International Handbook of New Digital Practices in Galleries Libraries Archives Museums and Heritage Sites . Routledge 159\u2013168.","key":"e_1_3_2_12_2","DOI":"10.4324\/9780429506765-14"},{"key":"e_1_3_2_13_2","author":"Caswell Michelle","year":"2017","unstructured":"Michelle Caswell, Ricardo Punzalan, and T-Kay Sangwand (Eds.). 2017. J. Crit. Libr. Inf. Stud. 1, 2 (July 2017).","journal-title":"J. Crit. Libr. Inf. Stud."},{"doi-asserted-by":"publisher","key":"e_1_3_2_14_2","DOI":"10.2196\/19273"},{"doi-asserted-by":"publisher","key":"e_1_3_2_15_2","DOI":"10.3389\/fcomm.2020.00039"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","DOI":"10.1609\/icwsm.v11i1.14955","article-title":"Automated hate speech detection and the problem of offensive language","author":"Davidson Thomas","year":"2017","unstructured":"Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media (2017).","journal-title":"Proceedings of the International AAAI Conference on Web and Social Media"},{"volume-title":"COVID-19 in Washington, D.C. Twitter Archive","unstructured":"DC Oral History Collaborative [n.d.]. COVID-19 in Washington, D.C. Twitter Archive. Retrieved October 27, 2020 from https:\/\/digdc.dclibrary.org\/islandora\/object\/dcplislandora%3A237558.","key":"e_1_3_2_17_2"},{"volume-title":"Documenting COVID-19","unstructured":"DocNow [n.d.]. Documenting COVID-19. Retrieved October, 27, 2020 from https:\/\/docs.google.com\/document\/d\/1v5tso8spFq6SpW53h2OJULcdRoPEbyI6xpah31kW-H0\/edit.","key":"e_1_3_2_18_2"},{"volume-title":"Documenting the Now","unstructured":"DocNow [n.d.]. Documenting the Now. Retrieved October 27, 2020 from https:\/\/www.docnow.io\/.","key":"e_1_3_2_19_2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_20_2","DOI":"10.1007\/BF02435625"},{"volume-title":"Encoded Archival Description","unstructured":"Encoded Archival Description [n.d.]. Encoded Archival Description. Retrieved October 27, 2020 from https:\/\/www.loc.gov\/ead\/.","key":"e_1_3_2_21_2"},{"key":"e_1_3_2_22_2","volume-title":"Digital Memory and the Archive","author":"Ernst Wolfgang","year":"2013","unstructured":"Wolfgang Ernst and Jussi Parikka. 2013. Digital Memory and the Archive. University of Minnesota Press."},{"doi-asserted-by":"publisher","key":"e_1_3_2_23_2","DOI":"10.1002\/pra2.475"},{"doi-asserted-by":"publisher","key":"e_1_3_2_24_2","DOI":"10.1002\/pra2.313"},{"doi-asserted-by":"publisher","key":"e_1_3_2_25_2","DOI":"10.18653\/v1\/W17-3013"},{"key":"e_1_3_2_26_2","first-page":"12","article-title":"Permeable binaries, societal grand challenges, and the roles of the twenty-first-century archival and recordkeeping profession","author":"Gilliland Anne J.","year":"2015","unstructured":"Anne J. Gilliland. 2015. Permeable binaries, societal grand challenges, and the roles of the twenty-first-century archival and recordkeeping profession. Archifacts (December 2015), 12\u201330.","journal-title":"Archifacts"},{"key":"e_1_3_2_27_2","first-page":"685","article-title":"Designing expert systems for archival evaluation and processing of computer mediated communications: Frameworks and methods","author":"Gilliland Anne J.","year":"2016","unstructured":"Anne J. Gilliland. 2016. Designing expert systems for archival evaluation and processing of computer mediated communications: Frameworks and methods. In Research in the Archival Multiverse, Anne J. Gilliland, Sue McKemmish, and Andrew J. Lau (Eds.). Monash University Press, Melbourne, 685\u2013721.","journal-title":"Research in the Archival Multiverse"},{"doi-asserted-by":"publisher","key":"e_1_3_2_28_2","DOI":"10.2218\/ijdc.v14i1.636"},{"doi-asserted-by":"publisher","key":"e_1_3_2_29_2","DOI":"10.17723\/aarc.55.2.48530jm12848r7p2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_30_2","DOI":"10.1300\/J141v04n03_12"},{"key":"e_1_3_2_31_2","first-page":"2672","article-title":"Generative adversarial nets","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, 2672\u20132680.","journal-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems"},{"doi-asserted-by":"publisher","key":"e_1_3_2_32_2","DOI":"10.1007\/s12103-020-09545-1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_33_2","DOI":"10.17723\/aarc.68.2.c741823776k65863"},{"key":"e_1_3_2_34_2","first-page":"468","article-title":"Classifying racist texts using a support vector machine","author":"Greevy Edel","year":"2004","unstructured":"Edel Greevy and Alan F. Smeaton. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 468\u2013469.","journal-title":"Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval"},{"unstructured":"Hatebase Inc.2020. Hatebase. Retrieved April. 26 2020 from https:\/\/hatebase.org\/.","key":"e_1_3_2_35_2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_36_2","DOI":"10.15585\/mmwr.mm6922e1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_37_2","DOI":"10.1609\/aaai.v27i1.8539"},{"doi-asserted-by":"publisher","key":"e_1_3_2_38_2","DOI":"10.4018\/jiscrm.2011100101"},{"doi-asserted-by":"crossref","unstructured":"Mark Latonero and Irina Shklovski. 2011. Emergency management Twitter and social media evangelism. International Journal of Information Systems for Crisis Response and Management (IJISCRAM) 3 4 (2011) 1\u201316.","key":"e_1_3_2_39_2","DOI":"10.4018\/jiscrm.2011100101"},{"doi-asserted-by":"publisher","key":"e_1_3_2_40_2","DOI":"10.1109\/BigData.2017.8258179"},{"issue":"2","key":"e_1_3_2_41_2","first-page":"60","article-title":"So you want to implement automatic categorization?","volume":"37","author":"Lubbes R. Kirk","year":"2003","unstructured":"R. Kirk Lubbes. 2003. So you want to implement automatic categorization? Inf. Manage. 37, 2 (2003), 60.","journal-title":"Inf. Manage."},{"key":"e_1_3_2_42_2","first-page":"22","article-title":"Metadata strategies and archival description: Comparing apples to oranges","volume":"39","author":"MacNeil Heather","year":"1995","unstructured":"Heather MacNeil. 1995. Metadata strategies and archival description: Comparing apples to oranges. Archivaria 39 (1995), 22\u201332.","journal-title":"Archivaria"},{"doi-asserted-by":"publisher","key":"e_1_3_2_43_2","DOI":"10.1080\/1369118X.2020.1739731"},{"key":"e_1_3_2_44_2","first-page":"299","article-title":"Do characters abuse more than words?","author":"Mehdad Yashar","year":"2016","unstructured":"Yashar Mehdad and Joel Tetreault. 2016. Do characters abuse more than words? In Proceedings of the SIGDIAL Conference, 299\u2013303.","journal-title":"Proceedings of the SIGDIAL Conference"},{"doi-asserted-by":"publisher","key":"e_1_3_2_45_2","DOI":"10.1045\/march2000-moore-pt1"},{"key":"e_1_3_2_46_2","volume-title":"Concepts in Distributed Data Management or History of the DICE Group","author":"Moore Reagan","year":"2015","unstructured":"Reagan Moore, Arcot Rajasekar, Michael Wan, Wayne Schroeder, Antoine de Torcy, Sheau-Yen Chen, Mike Conway, and Hao Xu. 2015. Concepts in Distributed Data Management or History of the DICE Group. Retrieved October 27, 2020 from https:\/\/irods.org\/uploads\/2015\/01\/DICE-History.pdf."},{"doi-asserted-by":"publisher","key":"e_1_3_2_47_2","DOI":"10.2307\/j.ctt1pwt9w5"},{"volume-title":"The Fundamental Standard for Digital Preservation","unstructured":"OAIS Reference Model [n.d.]. The Fundamental Standard for Digital Preservation. Retrieved October 26, 2020 from http:\/\/www.oais.info\/.","key":"e_1_3_2_48_2"},{"key":"e_1_3_2_49_2","volume-title":"Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy","author":"O\u2019Neil Cathy","year":"2016","unstructured":"Cathy O\u2019Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Books, Largo, MD."},{"doi-asserted-by":"publisher","key":"e_1_3_2_50_2","DOI":"10.18653\/v1\/W17-3006"},{"doi-asserted-by":"publisher","key":"e_1_3_2_51_2","DOI":"10.1177\/1461444817748483"},{"doi-asserted-by":"publisher","key":"e_1_3_2_52_2","DOI":"10.1111\/1468-5973.12196"},{"doi-asserted-by":"publisher","key":"e_1_3_2_53_2","DOI":"10.1080\/10841806.2020.1782128"},{"key":"e_1_3_2_54_2","volume-title":"Doing Digital Methods","author":"Rogers Richard","year":"2019","unstructured":"Richard Rogers. 2019. Doing Digital Methods. MIT Press, Cambridge, MA."},{"issue":"6","key":"e_1_3_2_55_2","first-page":"22","article-title":"Rise of the machines","volume":"43","author":"Santangelo James","year":"2009","unstructured":"James Santangelo. 2009. Rise of the machines. Inf. Manage. 43, 6 (2009), 22.","journal-title":"Inf. Manage."},{"doi-asserted-by":"publisher","key":"e_1_3_2_56_2","DOI":"10.1177\/2053951717738104"},{"doi-asserted-by":"publisher","key":"e_1_3_2_57_2","DOI":"10.1016\/j.ijinfomgt.2015.07.001"},{"volume-title":"Cultural Heritage Informatics","unstructured":"Society of the American Archivists [n.d.]. Cultural Heritage Informatics. Retrieved October 27, 2020 from https:\/\/www2.archivists.org\/dae\/kent-state-university\/cultural-heritage-informatics.","key":"e_1_3_2_58_2"},{"key":"e_1_3_2_59_2","first-page":"1","article-title":"The anxiety of being Asian American: Hate crimes and negative biases during the COVID-19 pandemic","author":"Tessler Hannah","year":"2020","unstructured":"Hannah Tessler, Meera Choi, and Grace Kao. 2020. The anxiety of being Asian American: Hate crimes and negative biases during the COVID-19 pandemic. Am. J. Crim. Just. (2020), 1\u201311.","journal-title":"Am. J. Crim. Just."},{"unstructured":"DILCIS Board. [n.d.]. The Digital Information LifeCycle Interoperability Standards Board. Retrieved October 26 2020 from https:\/\/dilcis.eu\/.","key":"e_1_3_2_60_2"},{"unstructured":"Nicol Turner-Lee Paul Resnick and Genie Barton. 2019. Algorithmic Bias Detection and Mitigation: Best Practices and Policies to Reduce Consumer Harms. Retrieved from https:\/\/www.brookings.edu\/research\/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms\/.","key":"e_1_3_2_61_2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_62_2","DOI":"10.1109\/BigData47090.2019.9005682"},{"doi-asserted-by":"publisher","key":"e_1_3_2_63_2","DOI":"10.1093\/idpl\/ipx005"},{"key":"e_1_3_2_64_2","first-page":"11","article-title":"Managing the present: Metadata as archival description","volume":"39","author":"Wallace David A.","year":"1995","unstructured":"David A. Wallace. 1995. Managing the present: Metadata as archival description. Archivaria 39 (February 1995), 11\u201321.","journal-title":"Archivaria"},{"doi-asserted-by":"publisher","key":"e_1_3_2_65_2","DOI":"10.18653\/v1\/N16-2013"},{"doi-asserted-by":"publisher","key":"e_1_3_2_66_2","DOI":"10.1093\/bjc\/azz049"},{"doi-asserted-by":"publisher","key":"e_1_3_2_67_2","DOI":"10.1007\/s10502-014-9233-1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_68_2","DOI":"10.1002\/asi.24357"},{"doi-asserted-by":"publisher","key":"e_1_3_2_69_2","DOI":"10.3828\/comma.2013.2.2"},{"doi-asserted-by":"publisher","key":"e_1_3_2_70_2","DOI":"10.1109\/BigData50022.2020.9377930"},{"doi-asserted-by":"crossref","unstructured":"Ziqi Zhang and Lei Luo. 2019. Hate speech detection: A solved problem? The challenging case of long tail on Twitter. Semantic Web 10 5 (2019) 925\u2013945.","key":"e_1_3_2_71_2","DOI":"10.3233\/SW-180338"}],"container-title":["Journal on Computing and Cultural Heritage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3547146","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3547146","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:55Z","timestamp":1750186975000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3547146"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,16]]},"references-count":70,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3547146"],"URL":"https:\/\/doi.org\/10.1145\/3547146","relation":{},"ISSN":["1556-4673","1556-4711"],"issn-type":[{"type":"print","value":"1556-4673"},{"type":"electronic","value":"1556-4711"}],"subject":[],"published":{"date-parts":[[2022,9,16]]},"assertion":[{"value":"2020-11-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}