{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T15:53:52Z","timestamp":1758815632261,"version":"3.41.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,8,22]],"date-time":"2018-08-22T00:00:00Z","timestamp":1534896000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"crossref","award":["ISSF"],"award-info":[{"award-number":["ISSF"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Comput. Cult. Herit."],"published-print":{"date-parts":[[2018,9,5]]},"abstract":"<jats:p>It is of great interest to researchers and scholars in many disciplines (particularly those working on cultural heritage projects) to study parallel passages (i.e., identical or similar pieces of text describing the same thing) in digital text archives. Although there exist a few software tools for this purpose, they are restricted to a specific domain (e.g., the Bible) or a specific language (e.g., Hebrew). In this article, we present in detail how we build a digital infrastructure that can facilitate the search and discovery of parallel passages for any domain in any language. It is at the core of our<jats:italic>Samtla<\/jats:italic>(Search And Mining Tools with Linguistic Analysis) system designed in collaboration with historians and linguists. The system has already been used to support research on five large text corpora that span a number of different domains and languages. The key to such a domain-independent and language-independent digital infrastructure is a novel combination of a character-based<jats:italic>n<\/jats:italic>-gram language model, space-optimized suffix tree, and generalized edit distance. A comprehensive evaluation through crowdsourcing shows that the effectiveness of our system\u2019s search functionality is on par with the human-level performance.<\/jats:p>","DOI":"10.1145\/3195727","type":"journal-article","created":{"date-parts":[[2018,8,22]],"date-time":"2018-08-22T12:41:46Z","timestamp":1534941706000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Finding Parallel Passages in Cultural Heritage Archives"],"prefix":"10.1145","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4851-4679","authenticated-orcid":false,"given":"Martyn","family":"Harris","sequence":"first","affiliation":[{"name":"Birkbeck, University of London, Malet Street, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mark","family":"Levene","sequence":"additional","affiliation":[{"name":"Birkbeck, University of London, Malet Street, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dell","family":"Zhang","sequence":"additional","affiliation":[{"name":"Birkbeck, University of London, Malet Street, London, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dan","family":"Levene","sequence":"additional","affiliation":[{"name":"Southampton University, Southampton, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,8,22]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"July","author":"Levene Dan","year":"2016","unstructured":"Prof. Dan Levene . 2016. Discussion on the meaning of the Aramaic related queries identified by Samtla. Personal communication , July 2016 . Prof. Dan Levene. 2016. Discussion on the meaning of the Aramaic related queries identified by Samtla. Personal communication, July 2016."},{"volume-title":"Proceedings (ECIR\u201911)","author":"Alonso Omar","key":"e_1_2_1_2_1","unstructured":"Omar Alonso and Ricardo A . Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval - 33rd European Conference on IR Research , Proceedings (ECIR\u201911) . 153--164. Omar Alonso and Ricardo A. Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval - 33rd European Conference on IR Research, Proceedings (ECIR\u201911). 153--164."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1484611.1484615"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2005.10.020"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of Digital Humanities","author":"Baron Alistair","year":"2009","unstructured":"Alistair Baron , Paul Rayson , and Dawn Archer . 2009 . Automatic standardization of spelling for historical text mining . In Proceedings of Digital Humanities 2009. University of Maryland. Alistair Baron, Paul Rayson, and Dawn Archer. 2009. Automatic standardization of spelling for historical text mining. In Proceedings of Digital Humanities 2009. University of Maryland."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 30th International Computer Archive of Modern and Medieval English Confererence (ICAME\u201909)","author":"Baron Alistair","year":"2009","unstructured":"Alistair Baron , Paul Rayson , and Dawn Archer . 2009 . The extent of spelling variation in early modern English . In Proceedings of the 30th International Computer Archive of Modern and Medieval English Confererence (ICAME\u201909) . Lancaster University, UK. Alistair Baron, Paul Rayson, and Dawn Archer. 2009. The extent of spelling variation in early modern English. In Proceedings of the 30th International Computer Archive of Modern and Medieval English Confererence (ICAME\u201909). Lancaster University, UK."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1177\/1354856507084420"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.1999.0128"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219873"},{"key":"e_1_2_1_10_1","first-page":"4","article-title":"Responsa: A full-text retrieval system with linguistic processing for a 65-million word corpus of jewish heritage in hebrew","volume":"12","author":"Choueka Yaacov","year":"1989","unstructured":"Yaacov Choueka . 1989 . Responsa: A full-text retrieval system with linguistic processing for a 65-million word corpus of jewish heritage in hebrew . Data Engineering 12 , 4 (November 1989), 22--31. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;86213.86220. Yaacov Choueka. 1989. Responsa: A full-text retrieval system with linguistic processing for a 65-million word corpus of jewish heritage in hebrew. Data Engineering 12, 4 (November 1989), 22--31. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;86213.86220.","journal-title":"Data Engineering"},{"key":"e_1_2_1_11_1","first-page":"1","article-title":"Taken possession of\u201d: The reprinting and reauthorship of Hawthorne\u2019s \u201cCelestial Railroad\u201d in the antebellum religious press","volume":"7","author":"Cordell Ryan","year":"2013","unstructured":"Ryan Cordell . 2013 . \u201c Taken possession of\u201d: The reprinting and reauthorship of Hawthorne\u2019s \u201cCelestial Railroad\u201d in the antebellum religious press . Digital Humanities Quarterly (DHQ) 7 , 1 (2013), 1 -- 1 . Ryan Cordell. 2013. \u201cTaken possession of\u201d: The reprinting and reauthorship of Hawthorne\u2019s \u201cCelestial Railroad\u201d in the antebellum religious press. Digital Humanities Quarterly (DHQ) 7, 1 (2013), 1--1.","journal-title":"Digital Humanities Quarterly (DHQ)"},{"key":"e_1_2_1_12_1","volume-title":"The 2nd International Conference on the Theory and Practice of Digital Libraries.","author":"Croft W. Bruce","year":"1995","unstructured":"W. Bruce Croft , Robert Cook , and Dean Wilder . 1995 . Providing government information on the internet: Experiences with THOMAS . In The 2nd International Conference on the Theory and Practice of Digital Libraries. W. Bruce Croft, Robert Cook, and Dean Wilder. 1995. Providing government information on the internet: Experiences with THOMAS. In The 2nd International Conference on the Theory and Practice of Digital Libraries."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Anthony C. Davison and D. V. Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge University Press. Retrieved from Anthony C. Davison and D. V. Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge University Press. Retrieved from","DOI":"10.1017\/CBO9780511802843"},{"volume-title":"Isaiah among the Ancient Near Eastern Prophets: A Comparative Study of the Earliest Stages of the Isaiah Tradition and the Neo-Assyrian Prophecies","author":"de Jong Matthijs J.","key":"e_1_2_1_14_1","unstructured":"Matthijs J. de Jong . 2007. Isaiah among the Ancient Near Eastern Prophets: A Comparative Study of the Earliest Stages of the Isaiah Tradition and the Neo-Assyrian Prophecies . Brill . Retrieved from https:\/\/books.google.co.uk\/books?id&equals;TpAQAQAAIAAJ. Matthijs J. de Jong. 2007. Isaiah among the Ancient Near Eastern Prophets: A Comparative Study of the Earliest Stages of the Isaiah Tradition and the Neo-Assyrian Prophecies. Brill. Retrieved from https:\/\/books.google.co.uk\/books?id&equals;TpAQAQAAIAAJ."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1111\/j.2517-6161.1977.tb01624.x","article-title":"Spearman\u2019s footrule as a measure of disarray","volume":"32","author":"Diaconis Persi","year":"1977","unstructured":"Persi Diaconis and R. L. Graham . 1977 . Spearman\u2019s footrule as a measure of disarray . Royal Statistical Society Series B 32 , 24 (1977), 262 -- 268 . Persi Diaconis and R. L. Graham. 1977. Spearman\u2019s footrule as a measure of disarray. Royal Statistical Society Series B 32, 24 (1977), 262--268.","journal-title":"Royal Statistical Society Series B"},{"volume-title":"Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 28--36","author":"Fagin Ronald","key":"e_1_2_1_16_1","unstructured":"Ronald Fagin , Ravi Kumar , and D. Sivakumar . 2003. Comparing top k lists . In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 28--36 . Ronald Fagin, Ravi Kumar, and D. Sivakumar. 2003. Comparing top k lists. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 28--36."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2914178.2914228"},{"key":"e_1_2_1_18_1","first-page":"1","article-title":"Building better digital humanities tools: Toward broader audiences and user-centered designs","volume":"6","author":"Gibbs Fred","year":"2012","unstructured":"Fred Gibbs and Trevor Owens . 2012 . Building better digital humanities tools: Toward broader audiences and user-centered designs . Digital Humanities Quarterly 6 , 2 (2012), 1 -- 1 . Fred Gibbs and Trevor Owens. 2012. Building better digital humanities tools: Toward broader audiences and user-centered designs. Digital Humanities Quarterly 6, 2 (2012), 1--1.","journal-title":"Digital Humanities Quarterly"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1201\/b17888"},{"volume-title":"Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology","author":"Gusfield Dan","key":"e_1_2_1_20_1","unstructured":"Dan Gusfield . 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology . Cambridge University Press . Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2740769.2740796"},{"key":"e_1_2_1_22_1","volume-title":"The anatomy of a search and mining system for digital archives. CoRR abs\/1603.07150","author":"Harris Martyn","year":"2016","unstructured":"Martyn Harris , Mark Levene , Dell Zhang , and Dan Levene . 2016. The anatomy of a search and mining system for digital archives. CoRR abs\/1603.07150 ( 2016 ). Retrieved from http:\/\/arxiv.org\/abs\/1603.07150. Martyn Harris, Mark Levene, Dell Zhang, and Dan Levene. 2016. The anatomy of a search and mining system for digital archives. CoRR abs\/1603.07150 (2016). Retrieved from http:\/\/arxiv.org\/abs\/1603.07150."},{"key":"e_1_2_1_23_1","volume-title":"the Workshop on Computer Interaction and Information Retrieval (HCIR\u201908)","author":"Hearst Marti A.","year":"2008","unstructured":"Marti A. Hearst . 2008 . UIs for faceted navigation: Recent advances and remaining open problems . In the Workshop on Computer Interaction and Information Retrieval (HCIR\u201908) . Marti A. Hearst. 2008. UIs for faceted navigation: Recent advances and remaining open problems. In the Workshop on Computer Interaction and Information Retrieval (HCIR\u201908)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2436256.2436263"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835499"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1229179.1229181"},{"volume-title":"Linguistic Peculiarities in the Aramaic Magic Bowl Texts","author":"Juusola Hannu","key":"e_1_2_1_28_1","unstructured":"Hannu Juusola and Suomen It\u00e4mainen Seura . 1999. Linguistic Peculiarities in the Aramaic Magic Bowl Texts . Finnish Oriental Society . Retrieved from https:\/\/books.google.co.uk\/books?id&equals;I0FZAAAAMAAJ. Hannu Juusola and Suomen It\u00e4mainen Seura. 1999. Linguistic Peculiarities in the Aramaic Magic Bowl Texts. Finnish Oriental Society. Retrieved from https:\/\/books.google.co.uk\/books?id&equals;I0FZAAAAMAAJ."},{"volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916)","author":"Kim Yoon","key":"e_1_2_1_29_1","unstructured":"Yoon Kim , Yacine Jernite , David Sontag , and Alexander M. Rush . 2016. Character-aware neural language models . In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916) . 2741--2749. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware neural language models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916). 2741--2749."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1357054.1357127"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqv052"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/1557769.1557821"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 5th IEEE International Enterprise Distributed Object Computing Conference","author":"Leff Avraham","year":"2001","unstructured":"Avraham Leff and James T. Rayfield . 2001. Web-application development using the model\/view\/controller design pattern . In Proceedings of the 5th IEEE International Enterprise Distributed Object Computing Conference , 2001 (EDOC\u201901). 118--127. Avraham Leff and James T. Rayfield. 2001. Web-application development using the model\/view\/controller design pattern. In Proceedings of the 5th IEEE International Enterprise Distributed Object Computing Conference, 2001 (EDOC\u201901). 118--127."},{"key":"e_1_2_1_34_1","volume-title":"An Introduction to Search Engines and Web Navigation","author":"Levene Mark","unstructured":"Mark Levene . 2010. An Introduction to Search Engines and Web Navigation ( 2 nd ed.). John Wiley 8 Sons, Hoboken, NJ. Mark Levene. 2010. An Introduction to Search Engines and Web Navigation (2nd ed.). John Wiley 8 Sons, Hoboken, NJ.","edition":"2"},{"key":"e_1_2_1_35_1","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions and reversals","volume":"10","author":"Levenshtein Vladimir I.","year":"1966","unstructured":"Vladimir I. Levenshtein . 1966 . Binary codes capable of correcting deletions, insertions and reversals . Soviet Physics Doklady 10 (1966), 707 . Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10 (1966), 707.","journal-title":"Soviet Physics Doklady"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Lison Pierre","year":"2016","unstructured":"Pierre Lison and J\u00f6rg Tiedemann . 2016 . OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles . In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916) . Pierre Lison and J\u00f6rg Tiedemann. 2016. OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1002\/aris.1440390108"},{"volume-title":"Introduction to Information Retrieval","author":"Manning Christopher","key":"e_1_2_1_38_1","unstructured":"Christopher Manning , Prabhakar Raghavan , and Hinrich Sch\u00fctze . 2008. Introduction to Information Retrieval . Cambridge University Press . Christopher Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze. 2008. Introduction to Information Retrieval. Cambridge University Press."},{"volume-title":"Foundations of Statistical Natural Language Processing","author":"Manning Chris","key":"e_1_2_1_39_1","unstructured":"Chris Manning and Hinrich Schutze . 1999. Foundations of Statistical Natural Language Processing . MIT Press , Cambridge, MA . Chris Manning and Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000009441.78971.be"},{"key":"e_1_2_1_41_1","volume-title":"Tweedie","author":"Meyn Sean","year":"2009","unstructured":"Sean Meyn and Richard L . Tweedie . 2009 . Markov Chains and Stochastic Stability (2nd ed.). Cambridge University Press , New York. Sean Meyn and Richard L. Tweedie. 2009. Markov Chains and Stochastic Stability (2nd ed.). Cambridge University Press, New York."},{"key":"e_1_2_1_42_1","volume-title":"Advances in Neural Information Processing Systems 26: Annual Conference on Neural Information Processing Systems (NIPS\u201913)","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Gregory S. Corrado , and Jeffrey Dean . 2013 . Distributed representations of words and phrases and their compositionality . In Advances in Neural Information Processing Systems 26: Annual Conference on Neural Information Processing Systems (NIPS\u201913) . 3111--3119. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: Annual Conference on Neural Information Processing Systems (NIPS\u201913). 3111--3119."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/262192.262203"},{"volume-title":"Aramaic Incantation Texts from Nippur","author":"Montgomery James A.","key":"e_1_2_1_44_1","unstructured":"James A. Montgomery . 1913. Aramaic Incantation Texts from Nippur . University Museum. Retrieved from https :\/\/books.google.co.uk\/books?id&equals;qg0TAAAAYAAJ. James A. Montgomery. 1913. Aramaic Incantation Texts from Nippur. University Museum. Retrieved from https:\/\/books.google.co.uk\/books?id&equals;qg0TAAAAYAAJ."},{"key":"e_1_2_1_45_1","volume-title":"Machine Learning: A Probabilistic Perspective","author":"Murphy Kevin P.","year":"2012","unstructured":"Kevin P. Murphy . 2012 . Machine Learning: A Probabilistic Perspective . MIT Press . Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/2961886"},{"volume-title":"A Companion to Digital Humanities","author":"Rommel Thomas","key":"e_1_2_1_47_1","unstructured":"Thomas Rommel . 2007. Literary studies . In A Companion to Digital Humanities . Blackwell Publishing , 88--96. Thomas Rommel. 2007. Literary studies. In A Companion to Digital Humanities. Blackwell Publishing, 88--96."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.880083"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148261"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJBRA.2008.017165"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321528"},{"key":"e_1_2_1_52_1","volume-title":"Strauss and George Eliot","author":"David","year":"1860","unstructured":"David F. Strauss and George Eliot . 1860 . The Life of Jesus: Critically Examined . Number v. 1 in The Life of Jesus. C. Blanchard. Retrieved from https:\/\/books.google.co.uk\/books?id&equals;RmdLqnfw1OgC. David F. Strauss and George Eliot. 1860. The Life of Jesus: Critically Examined. Number v. 1 in The Life of Jesus. C. Blanchard. Retrieved from https:\/\/books.google.co.uk\/books?id&equals;RmdLqnfw1OgC."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33290-6_8"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073012.1073079"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies -","volume":"1","author":"Omar","year":"2002","unstructured":"Omar F. Zaidan and Chris Callison-Burch. 2011. Crowdsourcing translation: Professional quality from non-professionals . In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (HLT\u201911). Association for Computational Linguistics, Stroudsburg, PA, 1220--1229. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals; 2002 472.2002626. Omar F. Zaidan and Chris Callison-Burch. 2011. Crowdsourcing translation: Professional quality from non-professionals. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (HLT\u201911). Association for Computational Linguistics, Stroudsburg, PA, 1220--1229. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;2002472.2002626."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000008"},{"volume-title":"Statistical Language Models for Information Retrieval. Morgan 8 Claypool Publishers","author":"Zhai ChengXiang","key":"e_1_2_1_57_1","unstructured":"ChengXiang Zhai . 2009. Statistical Language Models for Information Retrieval. Morgan 8 Claypool Publishers , San Francisco . ChengXiang Zhai. 2009. Statistical Language Models for Information Retrieval. Morgan 8 Claypool Publishers, San Francisco."},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the Workshop on Language Models for Information Retrieval (LMIR\u201901)","author":"Zhai Chengxiang","year":"2001","unstructured":"Chengxiang Zhai and John Lafferty . 2001 . The dual role of smoothing in the language modeling approach . In Proceedings of the Workshop on Language Models for Information Retrieval (LMIR\u201901) . 31--36. Chengxiang Zhai and John Lafferty. 2001. The dual role of smoothing in the language modeling approach. In Proceedings of the Workshop on Language Models for Information Retrieval (LMIR\u201901). 31--36."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/984321.984322"},{"key":"e_1_2_1_60_1","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems (NIPS\u201915)","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang , Junbo Jake Zhao , and Yann LeCun . 2015 . Character-level convolutional networks for text classification . In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems (NIPS\u201915) . 649--657. Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems (NIPS\u201915). 649--657."}],"container-title":["Journal on Computing and Cultural Heritage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3195727","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3195727","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:35Z","timestamp":1750213595000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3195727"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,22]]},"references-count":60,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,9,5]]}},"alternative-id":["10.1145\/3195727"],"URL":"https:\/\/doi.org\/10.1145\/3195727","relation":{},"ISSN":["1556-4673","1556-4711"],"issn-type":[{"type":"print","value":"1556-4673"},{"type":"electronic","value":"1556-4711"}],"subject":[],"published":{"date-parts":[[2018,8,22]]},"assertion":[{"value":"2017-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-08-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}