{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T22:55:38Z","timestamp":1762296938142,"version":"3.41.0"},"reference-count":18,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2005,3,1]],"date-time":"2005-03-01T00:00:00Z","timestamp":1109635200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2005,3]]},"abstract":"<jats:p>\n            In recent years, various types of tagged corpora have been constructed and much research using tagged corpora has been done. However, tagged corpora contain errors, which impedes the progress of research. Therefore, the correction of errors in corpora is an important research issue. In this study we investigate the correction of such errors, which we call\n            <jats:italic>corpus correction.<\/jats:italic>\n            Using machine-learning methods, we applied corpus correction to a verb modality corpus for machine translation. We used the maximum-entropy and decision-list methods as machine-learning methods. We compared several kinds of methods for corpus correction in our experiments, and determined which is most effective by using a statistical test. We obtained several noteworthy findings: (1) Precision was almost the same for both detection and correction, so it is more convenient to do both correction and detection, rather than detection only. (2) In general, the maximum-entropy method worked better than the decision-list method; but the two methods had almost the same precision for the top 50 pieces of extracted data when closed data was used. (3) In terms of precision, the use of closed data was better than the use of open data; however, in terms of the total number of extracted errors, the use of open data was better than the use of closed data. Based on our analysis of these results, we developed a good method for corpus correction. We confirmed the effectiveness of our method by carrying out experiments on machine translation. As corpus-based machine translation continues to be developed, the corpus correction we discuss in this article should prove to be increasingly significant.\n          <\/jats:p>","DOI":"10.1145\/1066078.1066080","type":"journal-article","created":{"date-parts":[[2005,8,3]],"date-time":"2005-08-03T08:30:55Z","timestamp":1123057855000},"page":"18-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Correction of errors in a verb modality corpus for machine translation with a machine-learning method"],"prefix":"10.1145","volume":"4","author":[{"given":"Masaki","family":"Murata","sequence":"first","affiliation":[{"name":"National Institute of Information and Communications Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masao","family":"Utiyama","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kiyotaka","family":"Uchimoto","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hitoshi","family":"Isahara","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qing","family":"Ma","sequence":"additional","affiliation":[{"name":"Ryukoku University, and National Institute of Information and Communications Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2005,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Abney S. Schapire R. E. and Singer Y. 1999. Boosting applied to tagging and PP attachment. EMNLP\/VLC-99.  Abney S. Schapire R. E. and Singer Y. 1999. Boosting applied to tagging and PP attachment. EMNLP\/VLC-99."},{"volume-title":"An Introduction to Support Vector Machines and Other Kernel-based Learning Methods","author":"Cristianini N.","key":"e_1_2_1_2_1","unstructured":"Cristianini , N. and Shawe-Taylor , J. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods . Cambridge University Press . Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press."},{"key":"e_1_2_1_3_1","unstructured":"Eskin E. 2000. Detecting errors within a corpus using anomaly detection. NAACL-2000.   Eskin E. 2000. Detecting errors within a corpus using anomaly detection. NAACL-2000."},{"volume-title":"Introduction to Statistical Pattern Recognition","author":"Fukunaga K.","key":"e_1_2_1_4_1","unstructured":"Fukunaga , K. 1972. Introduction to Statistical Pattern Recognition . Academic Press . Fukunaga, K. 1972. Introduction to Statistical Pattern Recognition. Academic Press."},{"key":"e_1_2_1_5_1","unstructured":"Kume M. Toyoshima T. and Nagata M. 1990. Japanese aspect processing for spoken language translation. In Information Processing Society of Japan the 40th National Convention 1F-7. 415--416. (In Japanese).  Kume M. Toyoshima T. and Nagata M. 1990. Japanese aspect processing for spoken language translation. In Information Processing Society of Japan the 40th National Convention 1F-7. 415--416. (In Japanese)."},{"volume-title":"TMI '99","author":"Murata M.","key":"e_1_2_1_6_1","unstructured":"Murata , M. , Ma , Q. , Uchimoto , K. , and Isahara , H . 1999. An example-based approach to Japanese-to-English translation of tense, aspect, and modality . In TMI '99 . 66--76. Murata, M., Ma, Q., Uchimoto, K., and Isahara, H. 1999. An example-based approach to Japanese-to-English translation of tense, aspect, and modality. In TMI '99. 66--76."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the ACL Workshop on the Data-Driven Machine Translation. ACM Press","author":"Murata M.","year":"1803","unstructured":"Murata , M. , Uchimoto , K. , Ma , Q. , and Isahara , H . 2001. Using a support-vector machine for Japanese-to-English translation of tense, aspect, and modality . In Proceedings of the ACL Workshop on the Data-Driven Machine Translation. ACM Press , New York. 10.3115\/11 1803 7.1118052 Murata, M., Uchimoto, K., Ma, Q., and Isahara, H. 2001. Using a support-vector machine for Japanese-to-English translation of tense, aspect, and modality. In Proceedings of the ACL Workshop on the Data-Driven Machine Translation. ACM Press, New York. 10.3115\/1118037.1118052"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/568954.568957"},{"key":"e_1_2_1_9_1","unstructured":"Nagao M. 1984. A framework of a mechanical translation between Japanese and English by analogy principle. Artificial and Human Intelligence. 173--180.   Nagao M. 1984. A framework of a mechanical translation between Japanese and English by analogy principle. Artificial and Human Intelligence. 173--180."},{"key":"e_1_2_1_10_1","unstructured":"Pietra S. D. Pietra V. D. and Lafferty J. 1995. Inducing features of random fields. Tech. Rep. CMU-CS-95-144. Carnegie Mellon University.   Pietra S. D. Pietra V. D. and Lafferty J. 1995. Inducing features of random fields. Tech. Rep. CMU-CS-95-144. Carnegie Mellon University."},{"volume-title":"Maximum entropy modeling for natural language. ACL\/EACL Tutorial Program","author":"Ristad E. S.","key":"e_1_2_1_11_1","unstructured":"Ristad , E. S. 1997. Maximum entropy modeling for natural language. ACL\/EACL Tutorial Program , Madrid . Ristad, E. S. 1997. Maximum entropy modeling for natural language. ACL\/EACL Tutorial Program, Madrid."},{"key":"e_1_2_1_12_1","unstructured":"Ristad E. S. 1998. Maximum entropy modeling toolkit. Release 1.6 beta. http:\/\/www.mnemonic.com\/software\/memt.  Ristad E. S. 1998. Maximum entropy modeling toolkit. Release 1.6 beta. http:\/\/www.mnemonic.com\/software\/memt."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022607331053"},{"key":"e_1_2_1_14_1","unstructured":"Sato S. 1993. Example-based translation of technical terms. In TMI-93. 58--68.  Sato S. 1993. Example-based translation of technical terms. In TMI-93. 58--68."},{"key":"e_1_2_1_15_1","volume-title":"eds","author":"Shimizu M.","year":"1976","unstructured":"Shimizu , M. and Narita , N. , eds . 1976 . The KODANSHA Japanese-English Dictionary. Kodansha. (In Japanese) . Shimizu, M. and Narita, N., eds. 1976. The KODANSHA Japanese-English Dictionary. Kodansha. (In Japanese)."},{"volume-title":"Proceedings of The Institute of Electronics, Information and Communication Engineers, Autumn Convention. D-69","author":"Shirai S.","key":"e_1_2_1_16_1","unstructured":"Shirai , S. , Yokoo , A. , and Bond , F . 1990. Generation of tense in newspaper translation . In Proceedings of The Institute of Electronics, Information and Communication Engineers, Autumn Convention. D-69 . (In Japanese). Shirai, S., Yokoo, A., and Bond, F. 1990. Generation of tense in newspaper translation. In Proceedings of The Institute of Electronics, Information and Communication Engineers, Autumn Convention. D-69. (In Japanese)."},{"key":"e_1_2_1_17_1","volume-title":"Example-based transfer of Japanese adnominal particles into English. IEICE Trans Information and Systems","author":"Sumita E.","year":"1992","unstructured":"Sumita , E. 1992. Example-based transfer of Japanese adnominal particles into English. IEICE Trans Information and Systems ( 1992 ), E75-D(4). Sumita, E. 1992. Example-based transfer of Japanese adnominal particles into English. IEICE Trans Information and Systems (1992), E75-D(4)."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 32rd Annual Meeting of the Association of the Computational Linguistics. 88--95","author":"Yarowsky D.","year":"1994","unstructured":"Yarowsky , D. 1994 . Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French . In Proceedings of the 32rd Annual Meeting of the Association of the Computational Linguistics. 88--95 . 10.3115\/981732.981745 Yarowsky, D. 1994. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of the 32rd Annual Meeting of the Association of the Computational Linguistics. 88--95. 10.3115\/981732.981745"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1066078.1066080","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1066078.1066080","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:17Z","timestamp":1750262897000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1066078.1066080"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,3]]},"references-count":18,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2005,3]]}},"alternative-id":["10.1145\/1066078.1066080"],"URL":"https:\/\/doi.org\/10.1145\/1066078.1066080","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2005,3]]},"assertion":[{"value":"2005-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}