{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T07:27:16Z","timestamp":1779262036403,"version":"3.51.4"},"reference-count":17,"publisher":"China Science Publishing & Media Ltd.","issue":"1","license":[{"start":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T00:00:00Z","timestamp":1664323200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,3,8]]},"abstract":"<jats:title>ABSTRACT<\/jats:title><jats:p>Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm\u2014what's popular and what's available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.<\/jats:p>","DOI":"10.1162\/dint_a_00162","type":"journal-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T22:13:43Z","timestamp":1664403223000},"page":"122-138","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":21,"title":["Automated metadata annotation: What is and is not possible with machine learning"],"prefix":"10.3724","volume":"5","author":[{"given":"Mingfang","family":"Wu","sequence":"first","affiliation":[{"name":"Australian Research Data Commons, Australian Research Data Commons, Melbourne, Australia, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hans","family":"Brandhorst","sequence":"additional","affiliation":[{"name":"Iconclass, Voorscoten, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maria-Cristina","family":"Marinescu","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joaquim More","family":"Lopez","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Barcelona, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Margorie","family":"Hlava","sequence":"additional","affiliation":[{"name":"Access Innovations, Albuquerque, New Mexico, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joseph","family":"Busch","sequence":"additional","affiliation":[{"name":"Taxonomy Strategies, Washington, DC, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"2026","published-online":{"date-parts":[[2023,3,8]]},"reference":[{"key":"2023030801042955000_","volume-title":"AI","author":"Teztecch. Artificial Intelligence"},{"key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/978-3-642-32560-1_7","article-title":"Artificial general intelligence and the human mental model","volume-title":"Singularity Hypothesis: A scientific and philosophical assessment","author":"Yampolskiy","year":"2012"},{"key":"2023030801042955000_","volume-title":"Understanding Metadata: What is metadata, and what is in it for: A Primer","author":"Riley"},{"key":"2023030801042955000_","volume-title":"Possibilities and Provocations","author":"Machine Learning, Libraries, and Cross-Disciplinary Research","year":"2020"},{"key":"2023030801042955000_","volume-title":"Machine learning meets library archives: Image analysis to generate descriptive metadata","author":"Maringanti","year":"2019"},{"issue":"1","key":"2023030801042955000_","first-page":"265","article-title":"Annif and Finto AI: Developing and implementing automated subject indexing","volume":"13","author":"Suominen","year":"2022","journal-title":"JLIS.It"},{"key":"2023030801042955000_","volume-title":"The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs","author":"Cai","year":"2015"},{"key":"2023030801042955000_","first-page":"721","article-title":"The art of detection","volume-title":"European Conference on Computer Vision","author":"Crowley","year":"2016"},{"issue":"4","key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3458885","article-title":"A dataset and a convolutional model for iconography classification in paintings","volume":"14","author":"Milani","year":"2021","journal-title":"Journal on Computing and Cultural Heritage"},{"key":"2023030801042955000_","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1109\/CVPR.2009.5206848","volume-title":"ImageNet: A large-scale hierarchical image database","author":"Deng","year":"2009"},{"key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"73694","DOI":"10.1109\/ACCESS.2019.2921101","article-title":"A deep learning perspective on beauty, sentiment, and remembrance of art","volume":"7","author":"Cetinic","year":"2019","journal-title":"IEEE Access"},{"key":"2023030801042955000_","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/9963.001.0001","volume-title":"Big Data, Little Data, No Data: Scholarship in the Networked World","author":"Borgman","year":"2015"},{"key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci Data"},{"issue":"4","key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"50","DOI":"10.3390\/info10040150","article-title":"Text classification Algorithms: A survey","volume":"10","author":"Kowsari","year":"2019","journal-title":"Information"},{"key":"2023030801042955000_","volume-title":"Social classification and folksonomy in art museums: Early data from the steve","author":"Trant"},{"key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1007\/s11263-015-0812-2","article-title":"Do we need more training data?","volume":"119","author":"Zhu","year":"2016","journal-title":"Int J Comput Vis"},{"issue":"3","key":"2023030801042955000_","doi-asserted-by":"crossref","first-page":"219","DOI":"10.5771\/0943-7444-2021-3-219","article-title":"Evaluating utility and automatic classification of subject metadata from research data australia","volume":"48","author":"Wu","year":"2021","journal-title":"Knowledge Organization"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/5\/1\/122\/2074281\/dint_a_00162.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/5\/1\/122\/2074281\/dint_a_00162.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:41:51Z","timestamp":1741938111000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00162"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,8]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00162","relation":{},"ISSN":["2641-435X"],"issn-type":[{"value":"2641-435X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}