{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T06:09:32Z","timestamp":1740809372398,"version":"3.38.0"},"reference-count":40,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2022,7,2]],"date-time":"2022-07-02T00:00:00Z","timestamp":1656720000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/100000208","name":"Institute of Museum and Library Services","doi-asserted-by":"publisher","award":["#LG-86-18-0061-18"],"award-info":[{"award-number":["#LG-86-18-0061-18"]}],"id":[{"id":"10.13039\/100000208","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:p> Data augmentation uses artificially created examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited availability of labelled data. We apply and evaluate a synthetic data approach to relationship classification in digital libraries, generating artificial books with relationships that are common in digital libraries but not easier inferred from existing metadata. Artificial books are generated by remixing existing texts into synthetically constructed formats. We find that for classification on whole\u2013part relationships between books, synthetic data improves a deep neural network classifier by 91%. Furthermore, we consider the ability of synthetic data to learn a useful new text relationship class from fully artificial training data. <\/jats:p>","DOI":"10.1177\/01655515221093031","type":"journal-article","created":{"date-parts":[[2022,7,2]],"date-time":"2022-07-02T08:46:01Z","timestamp":1656751561000},"page":"434-446","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Improving text relationship modelling with artificial data"],"prefix":"10.1177","volume":"50","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9058-2280","authenticated-orcid":false,"given":"Peter","family":"Organisciak","sequence":"first","affiliation":[{"name":"University of Denver, USA"}]},{"given":"Maggie","family":"Ryan","sequence":"additional","affiliation":[{"name":"University of Denver, USA"}]}],"member":"179","published-online":{"date-parts":[[2022,7,2]]},"reference":[{"first-page":"1","volume-title":"2017 ACM\/IEEE joint conference on digital libraries (JCDL)","author":"Bamman D","key":"bibr1-01655515221093031"},{"key":"bibr2-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-15742-5_40"},{"key":"bibr3-01655515221093031","unstructured":"Gatenby J, Greene RO, Oskins WM et al. GLIMIR: manifestation and content clustering within WorldCat. Code4lib J 2012; (17), https:\/\/journal.code4lib.org\/articles\/6812"},{"volume-title":"Conference on empirical methods on natural language processing","author":"Schofield A","key":"bibr4-01655515221093031"},{"key":"bibr5-01655515221093031","doi-asserted-by":"publisher","DOI":"10.29085\/9781856047159"},{"key":"bibr6-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1515\/9783110962451"},{"issue":"6014","key":"bibr7-01655515221093031","first-page":"176","volume-title":"Science","volume":"331","author":"Michel JB","year":"2011"},{"key":"bibr8-01655515221093031","unstructured":"York J. Building a future by preserving our past: the preservation infrastructure of HathiTrust digital library. In: World library and information congress: 76th IFLA general conference and assembly, Gothenburg, 10\u201315 August 2010, pp. 10\u201315, https:\/\/www.hathitrust.org\/documents\/hathitrust-ifla-201008.pdf"},{"key":"bibr9-01655515221093031","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1141"},{"key":"bibr10-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1038\/nature06137"},{"volume-title":"Distant reading","year":"2013","author":"Moretti F.","key":"bibr11-01655515221093031"},{"key":"bibr12-01655515221093031","unstructured":"Manovich L. Cultural analytics: visualising cultural patterns in the era of more media. Domus, March 2009, http:\/\/manovich.net\/content\/04-projects\/063-cultural-analytics-visualizing-cultural-patterns\/60_article_2009.pdf"},{"key":"bibr13-01655515221093031","unstructured":"Smith DA, Cordell R. A research agenda for historical and multilingual optical character recognition. Technical report, NULab, Northeastern University, 2018, https:\/\/ocr.northeastern.edu\/report"},{"first-page":"109","volume-title":"Proceedings of the 6th ACM\/IEEE-CS joint conference on digital libraries (JCDL \u201906)","author":"Manmatha R","key":"bibr14-01655515221093031"},{"first-page":"2363","volume-title":"Proceedings of the 56th annual meeting of the association for computational linguistics","author":"Dong R","key":"bibr15-01655515221093031"},{"key":"bibr16-01655515221093031","unstructured":"Nikolenko SI. Synthetic data for deep learning. arXiv: 1909.11512, 2019, http:\/\/arxiv.org\/abs\/1909.11512"},{"first-page":"117","volume-title":"2018 international interdisciplinary PhD workshop (IIPhDW)","author":"Miko\u0142ajczyk A","key":"bibr17-01655515221093031"},{"first-page":"1","volume-title":"2016 international conference on digital image computing: techniques and applications (DICTA)","author":"Wong SC","key":"bibr18-01655515221093031"},{"key":"bibr19-01655515221093031","unstructured":"Perez L, Wang J. The effectiveness of data augmentation in image classification using deep learning. arXiv: 1712.04621, 2017, http:\/\/arxiv.org\/abs\/1712.04621"},{"first-page":"909","volume-title":"2019 IEEE International Conference on Image Processing (ICIP)","author":"You Z","key":"bibr20-01655515221093031"},{"key":"bibr21-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.151"},{"key":"bibr22-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_7"},{"key":"bibr23-01655515221093031","unstructured":"Wei J, Zou K. EDA: easy data augmentation techniques for boosting performance on text classification tasks. arXiv: 1901.11196, 2019, http:\/\/arxiv.org\/abs\/1901.11196"},{"key":"bibr24-01655515221093031","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1190"},{"key":"bibr25-01655515221093031","first-page":"1423","volume-title":"2017 14th IAPR international conference on document analysis and recognition (ICDAR)","volume":"1","author":"Chiron G"},{"first-page":"1588","volume-title":"2019 International Conference on Document Analysis and Recognition (ICDAR)","author":"Rigaud C","key":"bibr26-01655515221093031"},{"key":"bibr27-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-22871-2_58"},{"first-page":"70","volume-title":"Proceedings of the 21st Nordic conference on computational linguistics","author":"Drobac S","key":"bibr28-01655515221093031"},{"volume":"10696","volume-title":"Tenth international conference on machine vision (ICMV 2017)","author":"Chernyshova YS","key":"bibr29-01655515221093031"},{"key":"bibr30-01655515221093031","unstructured":"Krishnan P, Jawahar CV. Generating synthetic data for text recognition. arXiv: 1608.04224, 2016, http:\/\/arxiv.org\/abs\/1608.04224"},{"key":"bibr31-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093392"},{"key":"bibr32-01655515221093031","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.468"},{"key":"bibr33-01655515221093031","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93040-4_28"},{"key":"bibr34-01655515221093031","unstructured":"Antoniou A, Storkey A, Edwards H. Data augmentation generative adversarial networks. arXiv: 1711.04340, 2018, http:\/\/arxiv.org\/abs\/1711.04340"},{"first-page":"289","volume-title":"2018 IEEE 15th international symposium on biomedical Imaging (ISBI 2018)","author":"Frid-Adar M","key":"bibr35-01655515221093031"},{"key":"bibr36-01655515221093031","unstructured":"Organisciak P, Capitanu B, Underwood T et al. Access to billions of pages for large-scale text analysis. In: iConference 2017 proceedings, vol. 2. Wuhan, China: iSchools, https:\/\/www.ideals.illinois.edu\/bitstream\/handle\/2142\/96256\/iconf-ef.pdf?sequence=2"},{"key":"bibr37-01655515221093031","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"issue":"1","key":"bibr38-01655515221093031","first-page":"1929","volume":"15","author":"Srivastava N","year":"2014","journal-title":"J Mach Learn Res"},{"key":"bibr39-01655515221093031","unstructured":"Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv: 1412.6980, 2017, http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"bibr40-01655515221093031","unstructured":"Dawson J. Mistaikes in books, 2016, https:\/\/rarebooksdigest.com\/2016\/07\/05\/mistaikes-in-books\/"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221093031","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515221093031","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221093031","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T02:21:11Z","timestamp":1740795671000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515221093031"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,2]]},"references-count":40,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["10.1177\/01655515221093031"],"URL":"https:\/\/doi.org\/10.1177\/01655515221093031","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"type":"print","value":"0165-5515"},{"type":"electronic","value":"1741-6485"}],"subject":[],"published":{"date-parts":[[2022,7,2]]}}}