{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T18:42:30Z","timestamp":1722883350644},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2006,6]]},"abstract":"<jats:p>Named entity (NE) extraction is one of the fundamental tasks in natural language processing (NLP). Although many studies have focused on identifying NEs within monolingual documents, aligning NEs in bilingual documents has not been investigated extensively due to the complexity of the task. In this article we introduce a new approach to aligning bilingual NEs in parallel corpora by incorporating statistical models with multiple knowledge sources. In our approach, we model the process of translating an English NE phrase into a Chinese equivalent using lexical translation\/transliteration probabilities for word translation and alignment probabilities for word reordering. The method involves automatically learning phrase alignment and acquiring word translations from a bilingual phrase dictionary and parallel corpora, and automatically discovering transliteration transformations from a training set of name-transliteration pairs. The method also involves language-specific knowledge functions, including handling abbreviations, recognizing Chinese personal names, and expanding acronyms. At runtime, the proposed models are applied to each source NE in a pair of bilingual sentences to generate and evaluate the target NE candidates; the source and target NEs are then aligned based on the computed probabilities. Experimental results demonstrate that the proposed approach, which integrates statistical models with extra knowledge sources, is highly feasible and offers significant improvement in performance compared to our previous work, as well as the traditional approach of IBM Model 4.<\/jats:p>","DOI":"10.1145\/1165255.1165257","type":"journal-article","created":{"date-parts":[[2006,10,18]],"date-time":"2006-10-18T18:11:32Z","timestamp":1161195092000},"page":"121-145","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources"],"prefix":"10.1145","volume":"5","author":[{"given":"Chun-Jen","family":"Lee","sequence":"first","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]},{"given":"Jason S.","family":"Chang","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]},{"given":"Jyh-Shing R.","family":"Jang","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2006,6]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 400--408","author":"Al-Onaizan Y."},{"key":"e_1_2_1_2_1","unstructured":"BCE. 2003. Britannica Concise Encyclopedia. http:\/\/wordpedia.britannica.com\/concise\/.  BCE. 2003. Britannica Concise Encyclopedia. http:\/\/wordpedia.britannica.com\/concise\/."},{"key":"e_1_2_1_3_1","unstructured":"BDC. 1992. The BDC Chinese-English Electronic Dictionary (version 2.0). Behavior Design Corp. Taiwan.  BDC. 1992. The BDC Chinese-English Electronic Dictionary (version 2.0). Behavior Design Corp. Taiwan."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007558221122"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002","author":"Black W. J.","year":"1885"},{"key":"e_1_2_1_6_1","unstructured":"Borthwick A. 1999. A maximum entropy approach to named entity recognition. Ph.D. dissertation New York University.   Borthwick A. 1999. A maximum entropy approach to named entity recognition. Ph.D. dissertation New York University."},{"key":"e_1_2_1_7_1","first-page":"263","article-title":"The mathematics of statistical machine translation: Parameter estimation","volume":"19","author":"Brown P. F.","year":"1993","journal-title":"Comput. Linguist."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002","author":"Carreras X.","year":"1885"},{"key":"e_1_2_1_9_1","first-page":"43","article-title":"Statistical translation model for phrases","volume":"6","author":"Chang J. S.","year":"2001","journal-title":"Comput. Linguist. Chinese Lang. Process."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 7th Message Understanding Conference (MUC-7).","author":"Chen H.-H."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 17th COLING and 36th ACLConference. 232--236","author":"Chen H.-H."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition. 1--8. 10","author":"Chen H.-H.","year":"1938"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the COLING Conference. 101--107","author":"Chen K.-J.","year":"2066"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 27th ACM International Conference on Research and Development in Information Retrieval (SIGIR). 10","author":"Cheng P-J."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the ROCLING XII Conference","author":"Chien L.-F."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 7th Message Understanding Conference (MUC-7).","author":"Chinchor N.","year":"1997"},{"key":"e_1_2_1_17_1","first-page":"21","article-title":"Adaptive bilingual sentence alignment","volume":"2499","author":"Chuang T. C.","year":"2002","journal-title":"Lecture Notes in Artificial Intelligence"},{"key":"e_1_2_1_18_1","unstructured":"CNA. 2003. Central News Agency. http:\/\/client.cna.com.tw.  CNA. 2003. Central News Agency. http:\/\/client.cna.com.tw."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/363958.363994"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the ACL Workshop on Multilingual and Mixedlanguage (NER","author":"Fei H.","year":"1938"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP","author":"Feng D.","year":"2004"},{"key":"e_1_2_1_22_1","unstructured":"Huai L. 1989. Handbook of English Name Knowledge 1st ed.  Huai L. 1989. Handbook of English Name Knowledge 1st ed."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711569"},{"key":"e_1_2_1_24_1","first-page":"599","article-title":"Machine transliteration","volume":"24","author":"Knight K.","year":"1998","journal-title":"Comput. Linguist."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711587"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-0","author":"Kumano T."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond","author":"Lee C.-J.","year":"1890"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 17th Pacific Asia Conference on Language, Information, and Computation (PACLIC","author":"Lee C.-J."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of IJCNLP-04 Workshop on Named Entity Recognition for Natural Language Processing Applications","author":"Lee C.-J."},{"key":"e_1_2_1_30_1","first-page":"144","article-title":"Alignment of bilingual named entities in parallel corpora using statistical model","volume":"3265","author":"Lee C.-J.","year":"2004","journal-title":"Lecture Notes in Artificial Intelligence"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2004.10.006"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages (IRAL","author":"Lee J. S."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002","author":"Lin W.-H.","year":"1885"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/984321.984324"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 7th Message Understanding Conference (MUC-7).","author":"Mikheev A."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics","author":"Moore R. C.","year":"2003"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR). 10","author":"Nie J.-Y."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103321337421"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 19th International Conference on Computational Linguistics (COLING","author":"Oh J.-H."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711578"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003","author":"Sang E. F.","year":"2003"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the Pacific Symposium Biocomputing (PSB).","author":"Schwartz A. S."},{"key":"e_1_2_1_43_1","unstructured":"Sinorama. 2002. Sinorama Magazine. http:\/\/www.greatman.com.tw\/sinorama.htm.  Sinorama. 2002. Sinorama Magazine. http:\/\/www.greatman.com.tw\/sinorama.htm."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the COLING\/ACL Workshop on Computational Approaches to Semitic Languages.","author":"Stalls B. G."},{"key":"e_1_2_1_45_1","first-page":"1","article-title":"A class-based language model approach to Chinese named entity identification","volume":"8","author":"Sun J.","year":"2003","journal-title":"Comput. Linguist. Chinese Lang. Process."},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1142\/S0219427902000649","article-title":"Automatic extraction of translational Japanese-KATAKANA and English word pairs from bilingual corpora","volume":"15","author":"Tsuji K.","year":"2002","journal-title":"Int. J. Comput. Process. Oriental Lang."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 17th COLING and 36th ACL. 1352--1356","author":"Wan S."},{"key":"e_1_2_1_48_1","first-page":"1","article-title":"Bilingual collocation extraction based on syntactic and statistical analyses","volume":"9","author":"Wu C.-C.","year":"2004","journal-title":"Comput. Linguist. Chinese Lang. Process."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002","author":"Wu D.","year":"1885"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10261"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 27th ACM International Conference on Research and Development in Information Retrieval (SIGIR). 10","author":"Zhang Y."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL","author":"Zhou G."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1165255.1165257","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T19:57:58Z","timestamp":1672257478000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1165255.1165257"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,6]]},"references-count":52,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2006,6]]}},"alternative-id":["10.1145\/1165255.1165257"],"URL":"https:\/\/doi.org\/10.1145\/1165255.1165257","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"value":"1530-0226","type":"print"},{"value":"1558-3430","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,6]]},"assertion":[{"value":"2006-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}