{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T17:17:20Z","timestamp":1780420640189,"version":"3.54.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,1,12]],"date-time":"2022-01-12T00:00:00Z","timestamp":1641945600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,12]],"date-time":"2022-01-12T00:00:00Z","timestamp":1641945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001638","name":"Dublin City University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001638","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["AI &amp; Soc"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Co-authored by a Computer Scientist and a Digital Humanist, this article examines the challenges faced by cultural heritage institutions in the digital age, which have led to the closure of the vast majority of born-digital archival collections. It focuses particularly on cultural organizations such as libraries, museums and archives, used by historians, literary scholars and other Humanities scholars. Most born-digital records held by cultural organizations are inaccessible due to privacy, copyright, commercial and technical issues. Even when born-digital data are publicly available (as in the case of web archives), users often need to physically travel to repositories such as the British Library or the Biblioth\u00e8que Nationale de France to consult web pages. Provided with enough sample data from which to learn and train their models, AI, and more specifically machine learning algorithms, offer the opportunity to improve and ease the access to digital archives by learning to perform complex human tasks. These vary from providing intelligent support for searching the archives to automate tedious and time-consuming tasks.\u00a0 In this article, we focus on sensitivity review as a practical\u00a0solution to unlock digital archives that would allow archival institutions to make non-sensitive information available. This promise to make archives more accessible does not come free of warnings for potential pitfalls and risks: inherent errors, \"black box\" approaches that make the algorithm inscrutable, and risks related to bias, fake, or partial information. Our central argument is that AI can deliver its promise to make digital archival collections more accessible, but it also creates new challenges - particularly in terms of ethics. In the conclusion, we insist on the importance of fairness, accountability and transparency in the process of making digital archives more accessible.<\/jats:p>","DOI":"10.1007\/s00146-021-01367-x","type":"journal-article","created":{"date-parts":[[2022,1,12]],"date-time":"2022-01-12T18:02:40Z","timestamp":1642010560000},"page":"823-835","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":50,"title":["Unlocking digital archives: cross-disciplinary perspectives on AI and born-digital data"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2680-4571","authenticated-orcid":false,"given":"Lise","family":"Jaillant","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7144-8545","authenticated-orcid":false,"given":"Annalina","family":"Caputo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,1,12]]},"reference":[{"key":"1367_CR1","unstructured":"Alex B, Llewellyn C (2020) Library carpentry: text and data mining. Centre for Data, Culture and Society. University of Edinburgh. http:\/\/librarycarpentry.org\/lc-tdm\/. Accessed 3 May 2021"},{"key":"1367_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/2053951720970576","volume":"7","author":"S Ames","year":"2020","unstructured":"Ames S, Lewis S (2020) Disrupting the library: digital scholarship and Big Data at the National Library of Scotland. Big Data Soc 7:1\u20137. https:\/\/doi.org\/10.1177\/2053951720970576","journal-title":"Big Data Soc"},{"key":"1367_CR3","doi-asserted-by":"publisher","unstructured":"Baron JR, Payne N (2017) Dark archives and E-democracy: strategies for overcoming access barriers to the public record archives of the future. Presented at the 2017 conference for E-democracy and open government (CeDEM), pp 3\u201311. https:\/\/doi.org\/10.1109\/CeDEM.2017.27","DOI":"10.1109\/CeDEM.2017.27"},{"key":"1367_CR4","unstructured":"Bird S, Klein E, Loper E (2019) Natural language processing with python\u2014analyzing text with the natural language toolkit, O'Reilly Media. https:\/\/www.nltk.org\/book\/. Accessed 3 May 2021"},{"key":"1367_CR5","unstructured":"Bolukbasi T, Chang K-W, Zou J et al (2016) Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Proceedings of the 30th international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA, pp 4356\u20134364"},{"key":"1367_CR6","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/12549.001.0001","volume-title":"AI ethics","author":"M Coeckelbergh","year":"2020","unstructured":"Coeckelbergh M (2020) AI ethics. MIT Press, Cambridge"},{"key":"1367_CR7","first-page":"300","volume":"22","author":"T Cook","year":"1994","unstructured":"Cook T (1994) Electronic records, paper minds: the revolution in information management and archives in the post-custodial and post-modernist era. Arch Manuscr 22:300\u2013328","journal-title":"Arch Manuscr"},{"key":"1367_CR8","unstructured":"Cordell R (2020) Machine learning + libraries. https:\/\/labs.loc.gov\/static\/labs\/work\/reports\/Cordell-LOC-ML-report.pdf?loclr=blogsig. Accessed 3 May 2021"},{"key":"1367_CR9","doi-asserted-by":"publisher","first-page":"421","DOI":"10.7202\/303466ar","volume":"29","author":"M Dumont-Johnson","year":"1975","unstructured":"Dumont-Johnson M (1975) Peut-on faire l\u2019histoire de la femme\u202f? Revue D\u2019histoire De L\u2019am\u00e9rique Fran\u00e7aise 29:421\u2013428. https:\/\/doi.org\/10.7202\/303466ar","journal-title":"Revue D'histoire De L'am\u00e9rique Fran\u00e7aise"},{"key":"1367_CR10","unstructured":"Flood A (2011) Wendy Cope\u2019s archive sold to British Library. Guardian. https:\/\/www.theguardian.com\/books\/2011\/apr\/20\/wendy-cope-archive-british-library. Accessed 16 Apr 2021"},{"key":"1367_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1749603.1749605","volume":"42","author":"BCM Fung","year":"2010","unstructured":"Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42:1\u201353. https:\/\/doi.org\/10.1145\/1749603.1749605","journal-title":"ACM Comput Surv"},{"key":"1367_CR12","unstructured":"Gooding P, Terras M, Berube L (2019) Towards user-centric evaluation of UK non-print legal deposit: a digital library futures white paper. http:\/\/elegaldeposit.org\/dlf-white-paper. Accessed 16 Apr 2021"},{"key":"1367_CR13","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1007\/s11023-020-09517-8","volume":"30","author":"T Hagendorff","year":"2020","unstructured":"Hagendorff T (2020) The ethics of AI ethics: an evaluation of guidelines. Mind Mach 30:99\u2013120. https:\/\/doi.org\/10.1007\/s11023-020-09517-8","journal-title":"Mind Mach"},{"key":"1367_CR14","unstructured":"Intellectual Property Office (2014) Exceptions to Copyright: research. https:\/\/assets.publishing.service.gov.uk\/government\/uploads\/system\/uploads\/attachment_data\/file\/375954\/Research.pdf. Accessed 16 Apr 2021"},{"key":"1367_CR15","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1080\/01576895.2019.1640555","volume":"47","author":"L Jaillant","year":"2019","unstructured":"Jaillant L (2019) After the digital revolution: working with emails and born-digital records in literary and publishers\u2019 archives. Arch Manuscr 47:285\u2013304. https:\/\/doi.org\/10.1080\/01576895.2019.1640555","journal-title":"Arch Manuscr"},{"key":"1367_CR16","doi-asserted-by":"publisher","unstructured":"Jaillant L (2020) User experience and access to born-digital data produced by publishers: The case of Carcanet Press. In: Kirschenbaum M et al (eds) Books.Files: preservation of digital assets in the Contemporary Publishing Industry. University of Maryland and the Book Industry Study Group, College Park, MD, pp 38\u201339. https:\/\/doi.org\/10.13016\/1i33-pl0y. Accessed 26 Apr 2021","DOI":"10.13016\/1i33-pl0y"},{"key":"1367_CR17","doi-asserted-by":"crossref","unstructured":"Jo ES, Gebru T (2020) Lessons from archives: strategies for collecting sociocultural data in machine learning. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. Association for Computing Machinery, New York, NY, USA, pp 306\u2013316","DOI":"10.1145\/3351095.3372829"},{"key":"1367_CR18","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1038\/s42256-019-0088-2","volume":"1","author":"A Jobin","year":"2019","unstructured":"Jobin A, Ienca M, Vayena E (2019) The global landscape of AI ethics guidelines. Nat Mach Intell 1:389\u2013399. https:\/\/doi.org\/10.1038\/s42256-019-0088-2","journal-title":"Nat Mach Intell"},{"key":"1367_CR19","unstructured":"Living with Machines (2020) AHRC. https:\/\/ahrc.ukri.org\/documents\/publications\/living-with-machines\/. Accessed 16 Apr 2021"},{"key":"1367_CR20","unstructured":"Mackinlay R (2021) Why is most of the 20th century invisible to AI? Information Professional\u2014CILIP: the library and information association. https:\/\/www.cilip.org.uk\/news\/557160\/Why-is-most-of-the-20th-Century-invisible-to-AI.htm. Accessed 16 Apr 2021"},{"key":"1367_CR21","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1353\/lib.2008.0003","volume":"56","author":"KM Mason","year":"2007","unstructured":"Mason KM, Zanish-Belcher T (2007) Raising the archival consciousness: how women\u2019s archives challenge traditional approaches to collecting and use, or, what\u2019s in a name? Libr Trends 56:344\u2013359. https:\/\/doi.org\/10.1353\/lib.2008.0003","journal-title":"Libr Trends"},{"key":"1367_CR22","doi-asserted-by":"crossref","unstructured":"McDonald G, Macdonald C, Ounis I (2020a) Active learning stopping strategies for technology-assisted sensitivity review. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. Association for Computing Machinery, New York, NY, USA, pp 2053\u20132056","DOI":"10.1145\/3397271.3401267"},{"key":"1367_CR23","doi-asserted-by":"publisher","DOI":"10.1145\/3417334","author":"G Mcdonald","year":"2020","unstructured":"Mcdonald G, Macdonald C, Ounis I (2020b) How the accuracy and confidence of sensitivity classification affects digital sensitivity review. ACM Trans Inf Syst. https:\/\/doi.org\/10.1145\/3417334","journal-title":"ACM Trans Inf Syst"},{"key":"1367_CR24","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1145\/3458553.3458556","volume":"53","author":"A Olteanu","year":"2021","unstructured":"Olteanu A, Garcia-Gathright J, de Rijke M et al (2021) FACTS-IR: fairness, accountability, confidentiality, transparency, and safety in information retrieval. SIGIR Forum 53:20\u201343. https:\/\/doi.org\/10.1145\/3458553.3458556","journal-title":"SIGIR Forum"},{"key":"1367_CR25","doi-asserted-by":"publisher","DOI":"10.4159\/9780674249509","volume-title":"Burning the books: a history of knowledge under attack","author":"R Ovenden","year":"2020","unstructured":"Ovenden R (2020) Burning the books: a history of knowledge under attack. Harvard University Press, Cambridge"},{"key":"1367_CR26","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1080\/01576895.2017.1408024","volume":"46","author":"J Pledge","year":"2018","unstructured":"Pledge J, Dickens E (2018) Process and progress: working with born-digital material in the Wendy Cope archive at the British Library. Arch Manuscr 46:59\u201369. https:\/\/doi.org\/10.1080\/01576895.2017.1408024","journal-title":"Arch Manuscr"},{"key":"1367_CR27","first-page":"25","volume":"5","author":"PM Quinn","year":"1977","unstructured":"Quinn PM (1977) The archivist as activist. Ga Arch 5:25\u201335","journal-title":"Ga Arch"},{"key":"1367_CR28","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1080\/01576895.2018.1502088","volume":"47","author":"G Rolan","year":"2019","unstructured":"Rolan G, Humphries G, Jeffrey L, Samaras E, Antsoupova T, Stuart K (2019) More human than human? Artificial intelligence in the archive. Arch Manuscr 47:179\u2013203. https:\/\/doi.org\/10.1080\/01576895.2018.1502088","journal-title":"Arch Manuscr"},{"key":"1367_CR29","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1002\/asi.23363","volume":"67","author":"D S\u00e1nchez","year":"2016","unstructured":"S\u00e1nchez D, Batet M (2016) C-sanitized: a privacy model for document redaction and sanitization. J Assoc Inf Sci Technol 67:148\u2013163. https:\/\/doi.org\/10.1002\/asi.23363","journal-title":"J Assoc Inf Sci Technol"},{"key":"1367_CR30","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1080\/01576895.2019.1622138","volume":"47","author":"J Schneider","year":"2019","unstructured":"Schneider J et al (2019) Appraising, processing, and providing access to email in contemporary literary archives. Arch Manuscr 47:305\u2013326. https:\/\/doi.org\/10.1080\/01576895.2019.1622138","journal-title":"Arch Manuscr"},{"key":"1367_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/505282.505283","volume":"34","author":"F Sebastiani","year":"2002","unstructured":"Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34:1\u201347. https:\/\/doi.org\/10.1145\/505282.505283","journal-title":"ACM Comput Surv"},{"key":"1367_CR32","unstructured":"Some sort of record seemed vital: British Library acquires the archive of Wendy Cope (2011) British Library. https:\/\/www.bl.uk\/press-releases\/2011\/april\/some-sort-of-record-seemed-vital-british-library-acquires-the-archive-of-wendy-cope. Accessed 16 Apr 2021"},{"key":"1367_CR33","unstructured":"Souza RR, Coelho FC, Shah R, Connelly M (2016) Using Artificial Intelligence to identify state secrets"},{"issue":"5","key":"1367_CR34","doi-asserted-by":"publisher","first-page":"557","DOI":"10.1142\/S0218488502001648","volume":"10","author":"L Sweeney","year":"2002","unstructured":"Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557\u2013570. https:\/\/doi.org\/10.1142\/S0218488502001648","journal-title":"Int J Uncertain Fuzziness Knowl-Based Syst"},{"key":"1367_CR35","unstructured":"The National Archives (2016) The application of technology-assisted review to born-digital records transfer, inquiries and beyond. https:\/\/www.nationalarchives.gov.uk\/documents\/technology-assisted-review-to-born-digital-records-transfer.pdf Accessed 10 May 2021"},{"key":"1367_CR36","volume-title":"Privacy is power: why and how you should take back control of your data","author":"C V\u00e9liz","year":"2020","unstructured":"V\u00e9liz C (2020) Privacy is power: why and how you should take back control of your data. Bantam Press, London"},{"key":"1367_CR37","unstructured":"Verborgh R (2019) Getting my personal data out of Facebook. https:\/\/ruben.verborgh.org\/facebook\/. Accessed 22 July 2021"},{"key":"1367_CR38","doi-asserted-by":"crossref","unstructured":"Winters J (2017) Coda: web archives for humanities research\u2014some reflections. In: Br\u00fcgger N, Schroeder R. The Web as History. UCL Press, London, pp 238\u2013248. http:\/\/discovery.ucl.ac.uk\/1542998\/1\/The-Web-as-History.pdf. Accessed 16 Apr 2021","DOI":"10.2307\/j.ctt1mtz55k.18"},{"key":"1367_CR39","doi-asserted-by":"crossref","first-page":"2","DOI":"10.2352\/issn.2168-3204.2015.12.1.art00002","volume":"2015","author":"K Woods","year":"2015","unstructured":"Woods K, Lee CA (2015) Redacting private and sensitive information in born-digital collections. Arch Conf 2015:2\u20137","journal-title":"Arch Conf"},{"key":"1367_CR40","volume-title":"The age of surveillance capitalism : the fight for a human future at the new frontier of power","author":"S Zuboff","year":"2019","unstructured":"Zuboff S (2019) The age of surveillance capitalism\u202f: the fight for a human future at the new frontier of power. PublicAffairs, New York"}],"container-title":["AI &amp; SOCIETY"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00146-021-01367-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00146-021-01367-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00146-021-01367-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,15]],"date-time":"2023-11-15T19:20:51Z","timestamp":1700076051000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00146-021-01367-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,12]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["1367"],"URL":"https:\/\/doi.org\/10.1007\/s00146-021-01367-x","relation":{},"ISSN":["0951-5666","1435-5655"],"issn-type":[{"value":"0951-5666","type":"print"},{"value":"1435-5655","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,12]]},"assertion":[{"value":"18 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}