{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T20:10:01Z","timestamp":1736280601542,"version":"3.32.0"},"reference-count":32,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2007,12,1]],"date-time":"2007-12-01T00:00:00Z","timestamp":1196467200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Problems in the area of text and document processing can often be described as<jats:italic>text rewriting tasks<\/jats:italic>: given an input text, produce a new text by applying some fixed set of rewriting rules. In its simplest form, a rewriting rule is given by a pair of strings, representing a source string (the \u201coriginal\u201d) and its substitute. By a rewriting dictionary, we mean a finite list of such pairs; dictionary-based text rewriting means to replace in an input text occurrences of originals by their substitutes. We present an efficient method for constructing, given a rewriting dictionary<jats:italic>D<\/jats:italic>, a subsequential transducer<jats:private-char><jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" mime-subtype=\"gif\" mimetype=\"image\" xlink:type=\"simple\" xlink:href=\"S1351324905004092_char1\"\/><\/jats:private-char>that accepts any text<jats:italic>t<\/jats:italic>as input and outputs the intended rewriting result under the so-called \u201cleftmost-longest match\u201d replacement with skips,<jats:italic>t<\/jats:italic>'. The time needed to compute the transducer is linear in the size of the input dictionary. Given the transducer, any text<jats:italic>t<\/jats:italic>of length |<jats:italic>t<\/jats:italic>| is rewritten in a deterministic manner in time<jats:italic>O<\/jats:italic>(|<jats:italic>t<\/jats:italic>|+|<jats:italic>t<\/jats:italic>'|), where<jats:italic>t<\/jats:italic>' denotes the resulting output text. Hence the resulting rewriting mechanism is very efficient. As a second advantage, using standard tools, the transducer can be directly composed with other transducers to efficiently solve more complex rewriting tasks in a single processing step.<\/jats:p>","DOI":"10.1017\/s1351324905004092","type":"journal-article","created":{"date-parts":[[2006,2,15]],"date-time":"2006-02-15T08:56:19Z","timestamp":1139993779000},"page":"353-381","source":"Crossref","is-referenced-by-count":1,"title":["Efficient dictionary-based text rewriting using subsequential transducers"],"prefix":"10.1017","volume":"13","author":[{"given":"S.","family":"MIHOV","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"K. U.","family":"SCHULZ","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2007,12,1]]},"reference":[{"key":"S1351324905004092_ref5","doi-asserted-by":"publisher","DOI":"10.3115\/974499.974526"},{"key":"S1351324905004092_ref1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324997001599"},{"key":"S1351324905004092_ref21","doi-asserted-by":"crossref","unstructured":"Mihov S. and Maurel D. (2001) Direct construction of minimal acyclic subsequential transducers. Proceedings of the Conference on Implementation and Application of Automata CIAA'2000, number 2088 in LNCS, pp. 217\u2013229. Springer.","DOI":"10.1007\/3-540-44674-5_18"},{"key":"S1351324905004092_ref2","doi-asserted-by":"publisher","DOI":"10.1145\/360825.360855"},{"key":"S1351324905004092_ref3","unstructured":"Appelt D. , Hobbs J. R. , Bear J. , Israel D. and Tyson M. (1993) A finite state processor for information extraction from real world text. Proceedings International Joint Conference on Artificial Intelligence, pp. 1172\u20131178."},{"key":"S1351324905004092_ref18","doi-asserted-by":"crossref","first-page":"407","DOI":"10.7551\/mitpress\/3007.003.0016","volume-title":"Finite-State Language Processing","author":"Laporte","year":"1997"},{"key":"S1351324905004092_ref27","doi-asserted-by":"crossref","unstructured":"Porter M. An algorithm for suffix stripping. Program, 14(3): 130\u2013137.","DOI":"10.1108\/eb046814"},{"key":"S1351324905004092_ref6","unstructured":"Brown R. (2000) Example-based machine translation in the pangloss system. Proceedings of COLING'96: The 16th International Conference on Computational Linguistics, pp. 169\u2013174."},{"key":"S1351324905004092_ref14","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-1844-9"},{"volume-title":"Data Structures and Algorithms.","year":"1983","author":"Aho","key":"S1351324905004092_ref4"},{"key":"S1351324905004092_ref8","doi-asserted-by":"publisher","DOI":"10.1162\/089120100561601"},{"key":"S1351324905004092_ref20","unstructured":"Maier-Meyer P. (1995) Lexikon und automatische Lemmatisierung. PhD thesis, CIS, Universit\u00e4t M\u00fcnchen."},{"key":"S1351324905004092_ref7","doi-asserted-by":"crossref","unstructured":"Carr L. , Hall W. , Bechhofer S. and Goble C. (2001) Conceptual linking: Ontology-based open hypermedia. Proceedings of the 12th International World Wide Web Conference, pp. 334\u2013342.","DOI":"10.1145\/371920.372084"},{"volume-title":"Introduction to Automata Theory, Languages, and Computation","year":"2000","author":"Hopcroft","key":"S1351324905004092_ref10"},{"key":"S1351324905004092_ref11","doi-asserted-by":"publisher","DOI":"10.3115\/981863.981878"},{"key":"S1351324905004092_ref28","first-page":"227","article-title":"Deterministic part-of-speech tagging with finite state transducers","volume":"22","author":"Roche","year":"1995","journal-title":"Computational Linguistics"},{"key":"S1351324905004092_ref12","doi-asserted-by":"crossref","first-page":"117","DOI":"10.7551\/mitpress\/3007.003.0006","volume-title":"Finite-State Language Processing","author":"Karttunen","year":"1997"},{"key":"S1351324905004092_ref13","first-page":"331","article-title":"Regular models of phonological rule systems","volume":"20","author":"Kaplan","year":"1994","journal-title":"Computational Linguistics"},{"key":"S1351324905004092_ref16","doi-asserted-by":"publisher","DOI":"10.1145\/146370.146380"},{"key":"S1351324905004092_ref15","doi-asserted-by":"crossref","unstructured":"Krovetz B. (1993) Viewing morphology as an inference process. Proceedings of the 16th ACM SIGIR Conference, pp. 191\u2013202.","DOI":"10.1145\/160688.160718"},{"key":"S1351324905004092_ref17","doi-asserted-by":"publisher","DOI":"10.1016\/0885-2308(92)90019-Z"},{"key":"S1351324905004092_ref19","first-page":"212","article-title":"Direct building of minimal automaton for given list","volume":"91","author":"Mihov","year":"1998","journal-title":"Annuaire de l'Universit\u00e9 de Sofia \u201cSt. Kl. Ohridski\u201d"},{"key":"S1351324905004092_ref22","doi-asserted-by":"publisher","DOI":"10.1017\/S135132499600126X"},{"key":"S1351324905004092_ref23","first-page":"269","article-title":"Finite-state transducers in language and speech processing","volume":"23","author":"Mohri","year":"1997","journal-title":"Computational Linguistics"},{"key":"S1351324905004092_ref24","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-015-9719-7_7"},{"key":"S1351324905004092_ref26","doi-asserted-by":"publisher","DOI":"10.3115\/981863.981894"},{"key":"S1351324905004092_ref29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.7551\/mitpress\/3007.001.0001","volume-title":"Finite-State Language Processing","author":"Roche","year":"1997"},{"key":"S1351324905004092_ref9","doi-asserted-by":"crossref","unstructured":"Gerdemann D. and van Noord G. (1999) Transducers from rewrite rules with backreferences. Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL 99), pp. 126\u2013133.","DOI":"10.3115\/977035.977053"},{"key":"S1351324905004092_ref30","doi-asserted-by":"crossref","unstructured":"Strohmaier C. , Ringlstetter C. , Schulz K. U. and Mihov S. (2003a) Lexical postcorrection of OCR-results: The web as a dynamic secondary dictionary? Proceedings of the International Conference on Document Analysis and Recognition, ICDAR-03, pp. 1133\u20131137.","DOI":"10.1109\/ICDAR.2003.1227833"},{"key":"S1351324905004092_ref25","unstructured":"Mohri M. , Fernando P. and Michael R. (1996) Weighted automata in text and speech processing. Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended Finite State Models of Language, pp. 152\u2013155."},{"key":"S1351324905004092_ref31","doi-asserted-by":"crossref","unstructured":"Strohmaier C. , Ringlstetter C. , Schulz K. U. and Mihov S. (2003b) A visual and interactive tool for optimizing lexical postcorrection of OCR-results. Proceedings of DIAR-03.","DOI":"10.1109\/CVPRW.2003.10031"},{"key":"S1351324905004092_ref32","doi-asserted-by":"crossref","unstructured":"Vogel S. and Ney H. Translation with cascaded finite-state transducers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, ACL-2000, pp. 23\u201330.","DOI":"10.3115\/1075218.1075222"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324905004092","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T19:29:12Z","timestamp":1736278152000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324905004092\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,12]]},"references-count":32,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["S1351324905004092"],"URL":"https:\/\/doi.org\/10.1017\/s1351324905004092","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2007,12]]}}}