{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T16:37:19Z","timestamp":1732034239772},"reference-count":65,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2012,7,24]],"date-time":"2012-07-24T00:00:00Z","timestamp":1343088000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2013,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Spelling errors in digital documents are often caused by operational and cognitive mistakes, or by the lack of full knowledge about the language of the written documents. Computer-assisted solutions can help to detect and suggest replacements. In this paper, we present a new string distance metric for the Persian language to rank respelling suggestions of a misspelled Persian word by considering the effects of keyboard layout on typographical spelling errors as well as the homomorphic and homophonic aspects of words for orthographical misspellings. We also consider the misspellings caused by disregarded diacritics. Since the proposed string distance metric is custom-designed for the Persian language, we present the spelling aspects of the Persian language such as homomorphs, homophones, and diacritics. We then present our statistical analysis of a set of large Persian corpora to identify the causes and the types of Persian spelling errors. We show that the proposed string distance metric has a higher mean average precision and a higher mean reciprocal rank in ranking respelling candidates of Persian misspellings in comparison with other metrics such as the Hamming, Levenshtein, Damerau\u2013Levenshtein, Wagner\u2013Fischer, and Jaro\u2013Winkler metrics.<\/jats:p>","DOI":"10.1017\/s1351324912000186","type":"journal-article","created":{"date-parts":[[2012,7,24]],"date-time":"2012-07-24T09:27:07Z","timestamp":1343122027000},"page":"259-284","source":"Crossref","is-referenced-by-count":15,"title":["A novel string distance metric for ranking Persian respelling suggestions"],"prefix":"10.1017","volume":"19","author":[{"given":"OMID","family":"KASHEFI","sequence":"first","affiliation":[]},{"given":"MOHSEN","family":"SHARIFI","sequence":"additional","affiliation":[]},{"given":"BEHROOZ","family":"MINAIE","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2012,7,24]]},"reference":[{"key":"S1351324912000186_ref1","doi-asserted-by":"publisher","DOI":"10.3758\/BF03196972"},{"key":"S1351324912000186_ref33","doi-asserted-by":"publisher","DOI":"10.4324\/9780203192887"},{"key":"S1351324912000186_ref10","doi-asserted-by":"publisher","DOI":"10.1145\/363958.363994"},{"key":"S1351324912000186_ref27","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009902609570"},{"key":"S1351324912000186_ref22","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2003.1232265"},{"key":"S1351324912000186_ref45","volume-title":"Computer Programs for Spelling Correction","author":"Peterson","year":"1980"},{"key":"S1351324912000186_ref49","volume-title":"Proceedings of the 7th International Language Processing and Knowledge Engineering (NLP-KE)","author":"Rasooli","year":"2011"},{"key":"S1351324912000186_ref54","first-page":"118","article-title":"Research in spelling and handwriting","volume":"19","author":"Stauffer","year":"1949","journal-title":"Review of Educational Research"},{"key":"S1351324912000186_ref9","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10354"},{"key":"S1351324912000186_ref46","doi-asserted-by":"publisher","DOI":"10.1145\/6138.6146"},{"key":"S1351324912000186_ref29","volume-title":"Unicode Explained","author":"Korpela","year":"2006"},{"key":"S1351324912000186_ref64","first-page":"113","volume-title":"Electronics Magazine","author":"Yianilos","year":"1983"},{"key":"S1351324912000186_ref8","first-page":"4","article-title":"A Singaporean corpus of misspellings: analysis and implications","volume":"3","author":"Brown","year":"1988","journal-title":"Journal of the Simplified Spelling Society"},{"key":"S1351324912000186_ref13","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(82)90001-2"},{"key":"S1351324912000186_ref31","volume-title":"A Grammar of Contemporary Persian","author":"Lazard","year":"2012"},{"key":"S1351324912000186_ref39","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(87)90116-6"},{"key":"S1351324912000186_ref57","doi-asserted-by":"publisher","DOI":"10.1016\/S0019-9958(85)80046-2"},{"key":"S1351324912000186_ref50","doi-asserted-by":"publisher","DOI":"10.1109\/34.682181"},{"key":"S1351324912000186_ref17","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1950.tb00463.x"},{"key":"S1351324912000186_ref7","volume-title":"Proceedings of the 38th Annual Meeting on Association for Computational Linguistics","author":"Brill","year":"2000"},{"key":"S1351324912000186_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(76)90027-3"},{"key":"S1351324912000186_ref59","doi-asserted-by":"publisher","DOI":"10.1145\/321796.321811"},{"key":"S1351324912000186_ref2","doi-asserted-by":"publisher","DOI":"10.1145\/363282.363326"},{"key":"S1351324912000186_ref23","doi-asserted-by":"publisher","DOI":"10.1109\/ITCC.2002.1000354"},{"key":"S1351324912000186_ref63","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(83)90045-6"},{"key":"S1351324912000186_ref3","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2009.05.002"},{"key":"S1351324912000186_ref56","volume-title":"Proceedings of the 40th Annual Meeting on Association for Computational Linguistics","author":"Toutanova","year":"2002"},{"key":"S1351324912000186_ref4","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(83)90022-5"},{"key":"S1351324912000186_ref5","volume-title":"Proceedings of the Second Conference on Applied Natural Language Processing","author":"Berkel","year":"1988"},{"key":"S1351324912000186_ref6","volume-title":"Proceedings of the Eastern Joint IRE-AIEE-ACM Computer Conference","author":"Bledsoe","year":"1959"},{"key":"S1351324912000186_ref11","first-page":"117","volume-title":"School and Society","author":"Davis","year":"1922"},{"key":"S1351324912000186_ref12","first-page":"401","volume-title":"Developing a Spell-Checker for Tajik Using RAISE","author":"Davrondjon","year":"2002"},{"key":"S1351324912000186_ref14","first-page":"257","article-title":"On the need for parsing ill-formed input","volume":"7","author":"Eastman","year":"1981","journal-title":"Computational Linguistics"},{"key":"S1351324912000186_ref65","volume-title":"Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Zobel","year":"1996"},{"key":"S1351324912000186_ref15","volume-title":"Proceedings of the 3rd Workshop on Very Large Corpora","author":"Golding","year":"1996"},{"key":"S1351324912000186_ref16","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007545901558"},{"key":"S1351324912000186_ref19","volume-title":"The Thirteen Books of Euclids Elements","author":"Heath","year":"1956"},{"key":"S1351324912000186_ref55","doi-asserted-by":"publisher","DOI":"10.1111\/j.2044-8295.1983.tb01867.x"},{"key":"S1351324912000186_ref20","doi-asserted-by":"publisher","DOI":"10.1145\/360825.360861"},{"key":"S1351324912000186_ref21","unstructured":"Hodge V. J. , and Austin J. 2001. An evaluation of phonetic spell checkers. Technical Report ycs 338, Department of Computer Science, University of York, York, UK."},{"key":"S1351324912000186_ref24","doi-asserted-by":"publisher","DOI":"10.1023\/B:READ.0000044368.17444.7d"},{"key":"S1351324912000186_ref25","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1989.10478785"},{"key":"S1351324912000186_ref26","doi-asserted-by":"publisher","DOI":"10.1002\/sim.4780140510"},{"key":"S1351324912000186_ref28","first-page":"74","volume-title":"Evaluation Parameters","author":"Keen","year":"1971"},{"key":"S1351324912000186_ref30","doi-asserted-by":"publisher","DOI":"10.1145\/146370.146380"},{"key":"S1351324912000186_ref34","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(80)90002-1"},{"key":"S1351324912000186_ref35","volume-title":"Proceedings of the 2nd Conference on Applied Natural Language Processing","author":"Means","year":"1988"},{"key":"S1351324912000186_ref36","volume-title":"Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing)","author":"Megerdoomian","year":"2000"},{"key":"S1351324912000186_ref37","volume-title":"Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages","author":"Megerdoomian","year":"2004"},{"key":"S1351324912000186_ref38","volume-title":"Proceedings of the 2nd International Conference on Language Resources and Evaluation","author":"Min","year":"2000"},{"key":"S1351324912000186_ref40","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324908004804"},{"key":"S1351324912000186_ref43","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1080\/0031383730170109","article-title":"Types of orthographic error","volume":"17","author":"Ola","year":"1973","journal-title":"Scandinavian Journal of Educational Research"},{"key":"S1351324912000186_ref41","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-007-9028-6"},{"key":"S1351324912000186_ref42","unstructured":"Odell M. K. , and Russell R. C. 1918. US Patents Nos. 1261167 (1918) and 1435663 (1922). Washington, DC: US Patent and Trademark Office."},{"key":"S1351324912000186_ref32","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Soviet Physics Doklady"},{"key":"S1351324912000186_ref44","doi-asserted-by":"publisher","DOI":"10.1145\/359038.359041"},{"key":"S1351324912000186_ref47","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630340108"},{"key":"S1351324912000186_ref48","doi-asserted-by":"publisher","DOI":"10.1145\/358027.358048"},{"key":"S1351324912000186_ref51","doi-asserted-by":"publisher","DOI":"10.1037\/0033-2909.99.3.303"},{"key":"S1351324912000186_ref52","volume-title":"Proceedings of the Conference on Language Engineering","author":"Shaalan","year":"2003"},{"key":"S1351324912000186_ref53","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation","author":"Shamsfard","year":"2010"},{"key":"S1351324912000186_ref58","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/20.2.141"},{"key":"S1351324912000186_ref60","volume-title":"The State of Record Linkage and Current Research Problems","author":"Winkler","year":"1999"},{"key":"S1351324912000186_ref61","unstructured":"Winkler W. , and Thibaudeau Y. 1991. An application of the Fellegi-Sunter model of record linkage to the 1990 US decennial census. Research Report RR91\/09, US Bureau of the Census, Washington, DC."},{"key":"S1351324912000186_ref62","doi-asserted-by":"publisher","DOI":"10.1007\/BF00555366"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324912000186","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,24]],"date-time":"2019-04-24T18:55:53Z","timestamp":1556132153000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324912000186\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,24]]},"references-count":65,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2013,4]]}},"alternative-id":["S1351324912000186"],"URL":"https:\/\/doi.org\/10.1017\/s1351324912000186","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,24]]}}}