{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T18:13:49Z","timestamp":1761156829713,"version":"3.37.3"},"reference-count":60,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2020,3,13]],"date-time":"2020-03-13T00:00:00Z","timestamp":1584057600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1546079","AST-1903823","CNS-1157162"],"award-info":[{"award-number":["IIS-1546079","AST-1903823","CNS-1157162"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Teaching to Increase Diversity and Equity in STEM"},{"DOI":"10.13039\/100013064","name":"Association of American Colleges and Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100013064","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Computing machines allow quantitative analysis of large databases of text, providing knowledge that is difficult to obtain without using automation. This article describes Universal Data Analysis of Text (UDAT) \u2014a text analysis method that extracts a large set of numerical text content descriptors from text files and performs various pattern recognition tasks such as classification, similarity between classes, correlation between text and numerical values, and query by example. Unlike several previously proposed methods, UDAT is not based on frequency of words and links between certain key words and topics. The method is implemented as an open-source software tool that can provide detailed reports about the quantitative analysis of sets of text files, as well as exporting the numerical text content descriptors in the form of comma-separated values files to allow statistical or pattern recognition analysis with external tools. It also allows the identification of specific text descriptors that differentiate between classes or correlate with numerical values and can be applied to problems related to knowledge discovery in domains such as literature and social media. UDAT is implemented as a command-line tool that runs in Windows, and the open source is available and can be compiled in Linux systems. UDAT can be downloaded from http:\/\/people.cs.ksu.edu\/\u223clshamir\/downloads\/udat.<\/jats:p>","DOI":"10.1093\/llc\/fqaa007","type":"journal-article","created":{"date-parts":[[2020,1,30]],"date-time":"2020-01-30T20:09:47Z","timestamp":1580414987000},"page":"187-208","source":"Crossref","is-referenced-by-count":4,"title":["UDAT: Compound quantitative analysis of text using machine learning"],"prefix":"10.1093","volume":"36","author":[{"given":"Lior","family":"Shamir","sequence":"first","affiliation":[{"name":"Kansas State University, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,3,13]]},"reference":[{"first-page":"183","year":"2008","author":"Agichtein","key":"2021043000041290800_fqaa007-B1"},{"first-page":"729","year":"2005","author":"Anthony","key":"2021043000041290800_fqaa007-B2"},{"first-page":"2200","year":"2010","author":"Baccianella","key":"2021043000041290800_fqaa007-B3"},{"first-page":"291","year":"2010","author":"Becker","key":"2021043000041290800_fqaa007-B4"},{"key":"2021043000041290800_fqaa007-B5","first-page":"1","article-title":"Pattern recognition","volume":"128","author":"Bishop","year":"2006","journal-title":"Machine Learning"},{"issue":"2","key":"2021043000041290800_fqaa007-B6","first-page":"225","article-title":"On the path to a methodology for the critique of digital literature","volume":"32","author":"Brand\u00e3o","year":"2017","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"2021043000041290800_fqaa007-B7","first-page":"234","article-title":"Using models of lexical style to quantify free indirect discourse in modernist fiction","volume":"32","author":"Brooke","year":"2016","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"2021043000041290800_fqaa007-B8","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1037\/h0076540","article-title":"A computer readability formula designed for machine scoring","volume":"60","author":"Coleman","year":"1975","journal-title":"Journal of Applied Psychology"},{"key":"2021043000041290800_fqaa007-B9","first-page":"17: 1","article-title":"Sentiwordnet: a high-coverage lexical resource for opinion mining","author":"Esuli","year":"2007","journal-title":"Evaluation"},{"year":"1989","author":"Felsenstein","key":"2021043000041290800_fqaa007-B10"},{"year":"2002","author":"Felsenstein","key":"2021043000041290800_fqaa007-B11"},{"issue":"2","key":"2021043000041290800_fqaa007-B12","first-page":"301","article-title":"The small-world of le petit prince: revisiting the word frequency distribution","volume":"32","author":"Gamermann","year":"2016","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"2021043000041290800_fqaa007-B13","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1007\/s10791-011-9174-8","article-title":"Opinion-based entity ranking","volume":"15","author":"Ganesan","year":"2011","journal-title":"Information Retrieval"},{"key":"2021043000041290800_fqaa007-B14","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.patrec.2014.02.021","article-title":"Computer analysis of similarities between albums in popular music","volume":"45","author":"George","year":"2014","journal-title":"Pattern Recognition Letters"},{"year":"2014","author":"Goldberg","key":"2021043000041290800_fqaa007-B15"},{"issue":"1","key":"2021043000041290800_fqaa007-B16","doi-asserted-by":"crossref","first-page":"1171458","DOI":"10.1080\/23311983.2016.1171458","article-title":"A social network analysis of Twitter: mapping the digital humanities community","volume":"3","author":"Grandjean","year":"2016","journal-title":"Cogent Arts & Humanities"},{"issue":"5","key":"2021043000041290800_fqaa007-B17","doi-asserted-by":"crossref","DOI":"10.5210\/fm.v18i5.4529","article-title":"Navigating an imagined middle\u2013earth: finding and analyzing text\u2013based and film\u2013based mental images of middle\u2013earth through theonering. net online fan community","volume":"18","author":"Grek Martin","year":"2013","journal-title":"First Monday"},{"issue":"3","key":"2021043000041290800_fqaa007-B18","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/llc\/fqu007","article-title":"Computer-supported collation of modern manuscripts: Collatex and the Beckett digital manuscript project","volume":"30","author":"Haentjens Dekker","year":"2014","journal-title":"Digital Scholarship in the Humanities"},{"issue":"1","key":"2021043000041290800_fqaa007-B19","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The weka data mining software: an update","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explorations Newsletter"},{"volume-title":"Dutch Linguistics","year":"2005","author":"H\u00fcning","key":"2021043000041290800_fqaa007-B20"},{"issue":"4","key":"2021043000041290800_fqaa007-B21","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1145\/2771588","article-title":"Processing social media messages in mass emergency: a survey","volume":"47","author":"Imran","year":"2015","journal-title":"ACM Computing Surveys"},{"first-page":"470","year":"2010","author":"Laniado","key":"2021043000041290800_fqaa007-B22"},{"year":"2008","author":"Lebert","key":"2021043000041290800_fqaa007-B23"},{"key":"2021043000041290800_fqaa007-B24","first-page":"414","article-title":"Umigon: sentiment analysis for tweets based on terms lists and heuristics","volume":"2","author":"Levallois","year":"2013","journal-title":"Second Joint Conference on Lexical and Computational Semantics"},{"issue":"1","key":"2021043000041290800_fqaa007-B25","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1086\/427302","article-title":"Transcendental data: toward a cultural history and aesthetics of the new encoded discourse","volume":"31","author":"Liu","year":"2004","journal-title":"Critical Inquiry"},{"first-page":"55","year":"2014","author":"Manning","key":"2021043000041290800_fqaa007-B26"},{"year":"2002","author":"McCallum","key":"2021043000041290800_fqaa007-B27"},{"year":"2013","author":"Mikolov","key":"2021043000041290800_fqaa007-B28"},{"year":"2012","author":"Mozafari","key":"2021043000041290800_fqaa007-B29"},{"key":"2021043000041290800_fqaa007-B30","first-page":"20","article-title":"The profit in records management","volume":"20","author":"Odell","year":"1956","journal-title":"Systems (New York)"},{"issue":"11","key":"2021043000041290800_fqaa007-B31","doi-asserted-by":"crossref","first-page":"1684","DOI":"10.1016\/j.patrec.2008.04.013","article-title":"WND-CHARM: multi-purpose image classification using compound image transforms","volume":"29","author":"Orlov","year":"2008","journal-title":"Pattern Recognition Letters"},{"year":"2009","author":"Rayson","key":"2021043000041290800_fqaa007-B32"},{"year":"2011","author":"Rehurek","key":"2021043000041290800_fqaa007-B33"},{"issue":"2","key":"2021043000041290800_fqaa007-B34","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1023\/A:1026543900054","article-title":"The earth mover\u2019s distance as a metric for image retrieval","volume":"40","author":"Rubner","year":"2000","journal-title":"International Journal of Computer Vision"},{"issue":"2","key":"2021043000041290800_fqaa007-B35","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1037\/0022-3514.38.2.311","article-title":"A description of the affective quality attributed to environments","volume":"38","author":"Russell","year":"1980","journal-title":"Journal of Personality and Social Psychology"},{"issue":"11","key":"2021043000041290800_fqaa007-B36","doi-asserted-by":"crossref","first-page":"1281","DOI":"10.1109\/34.969118","article-title":"Edge, junction, and corner detection using color distributions","volume":"23","author":"Ruzon","year":"2001","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"1","key":"2021043000041290800_fqaa007-B37","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1166\/jmihi.2013.1121","article-title":"Correlation between brain MRI and continuous physiological and environmental traits using 2D global descriptors and multi-order image transforms","volume":"3","author":"Schwartz","year":"2013","journal-title":"Journal of Medical Imaging and Health Informatics"},{"issue":"5","key":"2021043000041290800_fqaa007-B38","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1007\/s11548-011-0550-z","article-title":"A computer analysis method for correlating knee X-rays with continuous indicators","volume":"6","author":"Shamir","year":"2011","journal-title":"International Journal of Computer Assisted Radiology and Surgery"},{"issue":"2","key":"2021043000041290800_fqaa007-B39","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1162\/LEON_a_00281","article-title":"Computer analysis reveals similarities between the artistic styles of Van Gogh and Pollock","volume":"45","author":"Shamir","year":"2012","journal-title":"Leonardo"},{"issue":"1","key":"2021043000041290800_fqaa007-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1504\/IJART.2015.067389","article-title":"What makes a Pollock Pollock: a machine vision approach","volume":"8","author":"Shamir","year":"2015","journal-title":"IJART"},{"issue":"972","key":"2021043000041290800_fqaa007-B41","doi-asserted-by":"crossref","first-page":"024003","DOI":"10.1088\/1538-3873\/129\/972\/024003","article-title":"Morphology-based query for galaxy image databases","volume":"129","author":"Shamir","year":"2016","journal-title":"Publications of the Astronomical Society of the Pacific"},{"issue":"11","key":"2021043000041290800_fqaa007-B42","doi-asserted-by":"crossref","first-page":"e1000974","DOI":"10.1371\/journal.pcbi.1000974","article-title":"Pattern recognition software and techniques for biological image analysis","volume":"6","author":"Shamir","year":"2010","journal-title":"PLoS computational biology"},{"issue":"10","key":"2021043000041290800_fqaa007-B43","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1016\/j.joca.2009.04.010","article-title":"Early detection of radiographic knee osteoarthritis using computer-aided analysis","volume":"17","author":"Shamir","year":"2009","journal-title":"Osteoarthritis and Cartilage"},{"issue":"2","key":"2021043000041290800_fqaa007-B44","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1109\/TBME.2008.2006025","article-title":"Knee x-ray image analysis method for automated detection of osteoarthritis","volume":"56","author":"Shamir","year":"2009","journal-title":"IEEE Transactions on Biomedical Engineering"},{"issue":"2","key":"2021043000041290800_fqaa007-B45","first-page":"8","article-title":"Impressionism, expressionism, surrealism: automated recognition of painters and schools of art","volume":"7","author":"Shamir","year":"2010","journal-title":"ACM Transactions on Applied Perception (TAP)"},{"issue":"1","key":"2021043000041290800_fqaa007-B46","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/1751-0473-3-13","article-title":"Wndchrm\u2013an open source utility for biological image analysis","volume":"3","author":"Shamir","year":"2008","journal-title":"Source Code for Biology and Medicine"},{"issue":"9","key":"2021043000041290800_fqaa007-B47","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1007\/s11517-008-0380-5","article-title":"IICBU 2008: a proposed benchmark suite for biological image analysis","volume":"46","author":"Shamir","year":"2008","journal-title":"Medical & biological engineering & computing"},{"issue":"1","key":"2021043000041290800_fqaa007-B48","first-page":"107036","article-title":"Progression analysis and stage discovery in continuous physiological processes using image computing","volume":"2010","author":"Shamir","year":"2010","journal-title":"EURASIP Journal on Bioinformatics and Systems Biology"},{"issue":"2","key":"2021043000041290800_fqaa007-B49","first-page":"7","article-title":"Computer analysis of art","volume":"5","author":"Shamir","year":"2012","journal-title":"Journal on Computing and Cultural Heritage (JOCCH"},{"key":"2021043000041290800_fqaa007-B50","first-page":"274","article-title":"Text analysis and visualization","author":"Sinclair","year":"2016","journal-title":"A New Companion to Digital Humanities"},{"issue":"3","key":"2021043000041290800_fqaa007-B51","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1109\/TKDE.2015.2492549","article-title":"Nearest keyword set search in multi-dimensional datasets","volume":"28","author":"Singh","year":"2016","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"2021043000041290800_fqaa007-B52","first-page":"1","article-title":"Automated readability index","author":"Smith","year":"1967","journal-title":"AMRL-TR: Aerospace Medical Research Laboratories"},{"first-page":"1631","year":"2013","author":"Socher","key":"2021043000041290800_fqaa007-B53"},{"issue":"3","key":"2021043000041290800_fqaa007-B54","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1109\/TKDE.2014.2345378","article-title":"Parsimonious topic models with salient word discovery","volume":"27","author":"Soleimani","year":"2015","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"8\u20139","key":"2021043000041290800_fqaa007-B55","first-page":"75","article-title":"The cognitive neuroscience of art: a preliminary fMRI observation","volume":"7","author":"Solso","year":"2000","journal-title":"Journal of Consciousness Studies"},{"issue":"4","key":"2021043000041290800_fqaa007-B56","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1007\/s13278-012-0079-3","article-title":"Social media and political communication: a social media analytics framework","volume":"3","author":"Stieglitz","year":"2013","journal-title":"Social Network Analysis and Mining"},{"issue":"12","key":"2021043000041290800_fqaa007-B57","doi-asserted-by":"crossref","first-page":"2544","DOI":"10.1002\/asi.21416","article-title":"Sentiment strength detection in short informal text","volume":"61","author":"Thelwall","year":"2010","journal-title":"Journal of the American Society for Information Science and Technology"},{"issue":"2","key":"2021043000041290800_fqaa007-B58","first-page":"435","article-title":"An application of a profile-based method for authorship verification: investigating the authenticity of Pliny the Younger\u2019s letter to Trajan concerning the Christians","volume":"32","author":"Tuccinardi","year":"2016","journal-title":"Digital Scholarship in the Humanities"},{"first-page":"1480","year":"2016","author":"Yang","key":"2021043000041290800_fqaa007-B59"},{"issue":"6","key":"2021043000041290800_fqaa007-B60","doi-asserted-by":"crossref","first-page":"1643","DOI":"10.1109\/TKDE.2014.2377727","article-title":"Probabilistic word selection via topic modeling","volume":"27","author":"Zhuang","year":"2015","journal-title":"IEEE Transactions on Knowledge and Data Engineering"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/dsh\/article-pdf\/36\/1\/187\/37603493\/fqaa007.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/dsh\/article-pdf\/36\/1\/187\/37603493\/fqaa007.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,30]],"date-time":"2021-04-30T00:05:50Z","timestamp":1619741150000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article\/36\/1\/187\/5804949"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,13]]},"references-count":60,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,3,13]]},"published-print":{"date-parts":[[2021,4,29]]}},"URL":"https:\/\/doi.org\/10.1093\/llc\/fqaa007","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"type":"print","value":"2055-7671"},{"type":"electronic","value":"2055-768X"}],"subject":[],"published-other":{"date-parts":[[2021,4,1]]},"published":{"date-parts":[[2020,3,13]]}}}