{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T15:18:06Z","timestamp":1773155886536,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2007,9,1]],"date-time":"2007-09-01T00:00:00Z","timestamp":1188604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2007,9]]},"abstract":"<jats:p>\n            This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character\n            <jats:italic>n<\/jats:italic>\n            -gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of\n            <jats:italic>recognition followed by validation:<\/jats:italic>\n            First, in the\n            <jats:italic>recognition<\/jats:italic>\n            process, we identify the most probable transliteration in the\n            <jats:italic>k<\/jats:italic>\n            -neighborhood of a recognized English word. Then, in the\n            <jats:italic>validation<\/jats:italic>\n            process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an\n            <jats:italic>F<\/jats:italic>\n            -measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.\n          <\/jats:p>","DOI":"10.1145\/1282080.1282081","type":"journal-article","created":{"date-parts":[[2007,10,14]],"date-time":"2007-10-14T12:41:11Z","timestamp":1192365671000},"page":"6","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["A phonetic similarity model for automatic extraction of transliteration pairs"],"prefix":"10.1145","volume":"6","author":[{"given":"Jin-Shea","family":"Kuo","sequence":"first","affiliation":[{"name":"National Taiwan University of Science and Technology, Taipei, Taiwan"}]},{"given":"Haizhou","family":"Li","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research, Singapore"}]},{"given":"Ying-Kuei","family":"Yang","sequence":"additional","affiliation":[{"name":"National Taiwan University of Science and Technology, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2007,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073150"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the Natural Language Processing Pacific Rim Symposium, 393--399","author":"Brill E.","unstructured":"Brill , E. , Kacmarcik , G. , and Brockett , C . 2001. Automatically harvesting Katakana-English term pairs from search engine query logs . In Proceedings of the Natural Language Processing Pacific Rim Symposium, 393--399 . Brill, E., Kacmarcik, G., and Brockett, C. 2001. Automatically harvesting Katakana-English term pairs from search engine query logs. In Proceedings of the Natural Language Processing Pacific Rim Symposium, 393--399."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 7th International World Wide Web Conference, 107--117","author":"Brin S.","unstructured":"Brin , S. and Page , L . 1998. The anatomy of a large-scale hypertextual Web search engine . In Proceedings of the 7th International World Wide Web Conference, 107--117 . Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International World Wide Web Conference, 107--117."},{"key":"e_1_2_1_4_1","first-page":"263","article-title":"The mathematics of statistical machine translation: Parameter estimation","volume":"19","author":"Brown P. F.","year":"1994","unstructured":"Brown , P. F. , Della Pietra , S. A. , Della Pietra , V. J. , and Mercer , R. L. 1994 . The mathematics of statistical machine translation: Parameter estimation . Computational Linguistics 19 , 2, 263 -- 311 . Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1994. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2, 263--311.","journal-title":"Computational Linguistics"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/992628.992669"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119384.1119385"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1977.tb01600.x"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the International Speech Communication Association Tutorial and Research Workshop of Speech Synthesis, 103--108","author":"Galescu L.","unstructured":"Galescu , L. and Allen , J . 2001. Bi-directional conversion between graphemes and phonemes using a joint N-gram model . In Proceedings of the International Speech Communication Association Tutorial and Research Workshop of Speech Synthesis, 103--108 . Galescu, L. and Allen, J. 2001. Bi-directional conversion between graphemes and phonemes using a joint N-gram model. In Proceedings of the International Speech Communication Association Tutorial and Research Workshop of Speech Synthesis, 103--108."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30211-7_12"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics Annual Meeting, 281--288","author":"Huang F.","unstructured":"Huang , F. , Vogel , S. , and Waibel , A . 2004. Improving name entity translation combining phonetic and semantic similarities . In Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics Annual Meeting, 281--288 . Huang, F., Vogel, S., and Waibel, A. 2004. Improving name entity translation combining phonetic and semantic similarities. In Proceedings of the Human Language Technology Conference\/North American Chapter of the Association for Computational Linguistics Annual Meeting, 281--288."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.3115\/990820.990876"},{"key":"e_1_2_1_12_1","unstructured":"Jurafsky D. and Martin J. H. 2000. Speech and Language Processing. Prentice-Hall Englewood Cliffs NJ 91--188.   Jurafsky D. and Martin J. H. 2000. Speech and Language Processing. Prentice-Hall Englewood Cliffs NJ 91--188."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 2nd International Conference on Language Resource and Evaluation, 1135--1411","author":"Kang B. J.","unstructured":"Kang , B. J. and Choi , K. S . 2000. Automatic transliteration and back-transliteration by decision tree learning . In Proceedings of the 2nd International Conference on Language Resource and Evaluation, 1135--1411 . Kang, B. J. and Choi, K. S. 2000. Automatic transliteration and back-transliteration by decision tree learning. In Proceedings of the 2nd International Conference on Language Resource and Evaluation, 1135--1411."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.3115\/990820.990881"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, ACM","author":"Kleinberg J.","year":"1998","unstructured":"Kleinberg , J. 1998 . Authoritative sources in a hyperlinked environment . In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, ACM , New York, 14--20. Kleinberg, J. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, 14--20."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/972764.972767"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219044.1219047"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation, 275--282","author":"Kuo J. S.","unstructured":"Kuo , J. S. and Yang , Y. K . 2004b. Generating paired transliterated-cognates using multiple pronunciation characteristics from Web corpora . In Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation, 275--282 . Kuo, J. S. and Yang, Y. K. 2004b. Generating paired transliterated-cognates using multiple pronunciation characteristics from Web corpora. In Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation, 275--282."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Conference on Chinese Computing, 131--138","author":"Kuo J. S.","unstructured":"Kuo , J. S. and Yang , Y. K . 2005. Incorporating pronunciation variation into extraction of transliterated-term pairs from Web corpora . In Proceedings of the International Conference on Chinese Computing, 131--138 . Kuo, J. S. and Yang, Y. K. 2005. Incorporating pronunciation variation into extraction of transliterated-term pairs from Web corpora. In Proceedings of the International Conference on Chinese Computing, 131--138."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009043"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118905.1118922"},{"key":"e_1_2_1_22_1","first-page":"17","article-title":"English to Korea statistical transliteration for information retrieval","volume":"12","author":"Lee J. S.","year":"1998","unstructured":"Lee , J. S. and Choi , K. S. 1998 . English to Korea statistical transliteration for information retrieval . Computer Processing of Oriental Languages 12 , 1, 17 -- 37 . Lee, J. S. and Choi, K. S. 1998. English to Korea statistical transliteration for information retrieval. Computer Processing of Oriental Languages 12, 1, 17--37.","journal-title":"Computer Processing of Oriental Languages"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218976"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118853.1118870"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 6th Conference of the Association for Machine Translation in the Americas, 177--186","author":"Lin T.","unstructured":"Lin , T. , Wu , J. C. , and Chang , J. S . 2004. Extraction of name and transliteration in monolingual and parallel corpora . In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas, 177--186 . Lin, T., Wu, J. C., and Chang, J. S. 2004. Extraction of name and transliteration in monolingual and parallel corpora. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas, 177--186."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of Eurospeech'2001","volume":"3","author":"Llitjos A. F.","year":"1919","unstructured":"Llitjos , A. F. and Black , A . 2001. Knowledge of language origin improves pronunciation accuracy of proper names . In Proceedings of Eurospeech'2001 , Vol. 3 , 1919 --1922. Llitjos, A. F. and Black, A. 2001. Knowledge of language origin improves pronunciation accuracy of proper names. In Proceedings of Eurospeech'2001, Vol. 3, 1919--1922."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/568954.568958"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop, 311--314","author":"Meng H.","unstructured":"Meng , H. , Lo , W. K , Chen , B. , and Tang , K . 2001. Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval . In Proceedings of the Automatic Speech Recognition and Understanding Workshop, 311--314 . Meng, H., Lo, W. K, Chen, B., and Tang, K. 2001. Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In Proceedings of the Automatic Speech Recognition and Understanding Workshop, 311--314."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1981.tb00272.x"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118037.1118050"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072327"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing","author":"Pagel V.","unstructured":"Pagel , V. , Lenzo , K. , and Black , A . 1998. Letter to sound rules for accented lexicon compression . In Proceedings of the International Conference on Spoken Language Processing , 2015--2020. Pagel, V., Lenzo, K., and Black, A. 1998. Letter to sound rules for accented lexicon compression. In Proceedings of the International Conference on Spoken Language Processing, 2015--2020."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860499"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 3rd International Conference on Language Resources and Evaluation, 499--502","author":"Tsuji K.","unstructured":"Tsuji , K. , Dailley , B. , and Kageura , K . 2002. Extracting French-Japanese word pairs from bilingual corpora based on transliteration rules . In Proceedings of the 3rd International Conference on Language Resources and Evaluation, 499--502 . Tsuji, K., Dailley, B., and Kageura, K. 2002. Extracting French-Japanese word pairs from bilingual corpora based on transliteration rules. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, 499--502."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119384.1119392"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.3115\/980432.980789"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118824.1118838"},{"key":"e_1_2_1_38_1","volume-title":"Chinese Transliteration of Foreign Personal Names","author":"Xinhua News Agency","unstructured":"Xinhua News Agency . 1992. Chinese Transliteration of Foreign Personal Names . The Commercial Press . Xinhua News Agency. 1992. Chinese Transliteration of Foreign Personal Names. The Commercial Press."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1282080.1282081","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1282080.1282081","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:58:18Z","timestamp":1750258698000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1282080.1282081"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9]]},"references-count":38,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2007,9]]}},"alternative-id":["10.1145\/1282080.1282081"],"URL":"https:\/\/doi.org\/10.1145\/1282080.1282081","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"value":"1530-0226","type":"print"},{"value":"1558-3430","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,9]]},"assertion":[{"value":"2007-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}