{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T23:17:41Z","timestamp":1777331861725,"version":"3.51.4"},"reference-count":6,"publisher":"MIT Press - Journals","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2005,12]]},"abstract":"<jats:p> We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available. <\/jats:p>","DOI":"10.1162\/089120105775299168","type":"journal-article","created":{"date-parts":[[2006,1,12]],"date-time":"2006-01-12T21:18:40Z","timestamp":1137100720000},"page":"477-504","source":"Crossref","is-referenced-by-count":125,"title":["Improving Machine Translation Performance by Exploiting Non-Parallel Corpora"],"prefix":"10.1162","volume":"31","author":[{"given":"Dragos Stefan","family":"Munteanu","sequence":"first","affiliation":[{"name":"Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292"}]},{"given":"Daniel","family":"Marcu","sequence":"additional","affiliation":[{"name":"Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292"}]}],"member":"281","reference":[{"issue":"2","key":"p_2","first-page":"79","volume":"16","author":"Brown Peter F","year":"1990","journal-title":"Computational Linguistics"},{"issue":"2","key":"p_3","first-page":"263","volume":"19","author":"Brown Peter F","year":"1993","journal-title":"Computational Linguistics"},{"key":"p_4","first-page":"95","volume":"43","author":"Darroch J. N.","year":"1974","journal-title":"Annals of Mathematical Statistics"},{"issue":"1","key":"p_18","first-page":"107","volume":"25","author":"Melamed Dan I","year":"1999","journal-title":"Computational Linguistics"},{"key":"p_26","doi-asserted-by":"publisher","DOI":"10.1162\/089120103321337421"},{"key":"p_30","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711578"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/089120105775299168","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:42:31Z","timestamp":1615585351000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/31\/4\/477-504\/1891"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,12]]},"references-count":6,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2005,12]]}},"alternative-id":["10.1162\/089120105775299168"],"URL":"https:\/\/doi.org\/10.1162\/089120105775299168","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,12]]}}}