{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,2,19]],"date-time":"2023-02-19T23:06:06Z","timestamp":1676847966404},"reference-count":33,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2011,9,13]],"date-time":"2011-09-13T00:00:00Z","timestamp":1315872000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The paper presents a novel unified algorithm for aligning sentences with their translations in bilingual data. With the help of ideas from a stack-based dynamic programming decoder for speech recognition (Ney 1984), the search is parametrized in a novel way such that the unified algorithm can be used on various types of data that have been previously handled by separate implementations: the extracted text chunk pairs can be either sub-sentential pairs, one-to-one, or many-to-many sentence-level pairs. The one-stage search algorithm is carried out in a single run over the data. Its memory requirements are independent of the length of the source document, and it is applicable to sentence-level parallel as well as comparable data. With the help of a unified beam-search candidate pruning, the algorithm is very efficient: it avoids any document-level pre-filtering and uses less restrictive sentence-level filtering. Results are presented on a Russian\u2013English, a Spanish\u2013English, and an Arabic\u2013English extraction task. Based on simple word-based scoring features, text chunk pairs are extracted out of several trillion candidates, where the search is carried out on 300 processors in parallel.<\/jats:p>","DOI":"10.1017\/s135132491100026x","type":"journal-article","created":{"date-parts":[[2011,9,13]],"date-time":"2011-09-13T09:53:17Z","timestamp":1315907597000},"page":"33-60","source":"Crossref","is-referenced-by-count":2,"title":["A unified alignment algorithm for bilingual data"],"prefix":"10.1017","volume":"19","author":[{"given":"CHRISTOPH","family":"TILLMANN","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"SANJIKA","family":"HEWAVITHARANA","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2011,9,13]]},"reference":[{"key":"S135132491100026X_ref18","doi-asserted-by":"publisher","DOI":"10.1162\/0891201042544884"},{"key":"S135132491100026X_ref19","first-page":"161","volume-title":"Proceedings of the Joint HLT and NAACL Conference (HLT 04)","author":"Och","year":"2004"},{"key":"S135132491100026X_ref17","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1984.1164320"},{"key":"S135132491100026X_ref11","first-page":"489","volume-title":"Proceedings of LREC 06","author":"Ma","year":"2006"},{"key":"S135132491100026X_ref4","first-page":"9","volume-title":"Proceedings of ACL 93","author":"Chen","year":"1993"},{"key":"S135132491100026X_ref3","first-page":"263","article-title":"The mathematics of statistical machine translation: parameter estimation","volume":"19","author":"Brown","year":"1993","journal-title":"Computational Linguistics"},{"key":"S135132491100026X_ref2","first-page":"169","volume-title":"Proceedings of ACL 91","author":"Brown","year":"1991"},{"key":"S135132491100026X_ref8","first-page":"177","volume-title":"Proceedings of ACL 91","author":"Gale","year":"1991"},{"key":"S135132491100026X_ref27","first-page":"856","volume-title":"Proceedings of EMNLP08","author":"Snover","year":"2008"},{"key":"S135132491100026X_ref6","first-page":"57","volume-title":"Proceedings of EMNLP 04","author":"Fung","year":"2004"},{"key":"S135132491100026X_ref21","first-page":"2095","volume-title":"Proceedings of ICASSP 96","author":"Ortmanns","year":"1996"},{"key":"S135132491100026X_ref22","first-page":"311","volume-title":"Proceedings of ACL 02","author":"Papineni","year":"2002"},{"key":"S135132491100026X_ref20","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-7713-7"},{"key":"S135132491100026X_ref14","first-page":"135","volume-title":"Proceedings of AMTA 05","author":"Moore","year":"2002"},{"key":"S135132491100026X_ref7","first-page":"61","volume-title":"Proceedings of ACL Workshop on Building and Using Comparable Corpora","author":"Hewavitharana","year":"2011"},{"key":"S135132491100026X_ref16","first-page":"81","volume-title":"Proceedings of COLING\/ACL 06","author":"Munteanu","year":"2006"},{"key":"S135132491100026X_ref1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1982.1171441"},{"key":"S135132491100026X_ref10","volume-title":"Proceedings of AMTA 04","author":"Koehn","year":"2004"},{"key":"S135132491100026X_ref9","first-page":"127","volume-title":"Proceedings of HLT-NAACL 03","author":"Koehn","year":"2003"},{"key":"S135132491100026X_ref33","first-page":"745","volume-title":"IEEE International Conference on Data Mining (ICDM 2002)","author":"Zhao","year":"2002"},{"key":"S135132491100026X_ref23","volume-title":"English Gigaword Corpus","author":"Parker","year":"2009"},{"key":"S135132491100026X_ref28","first-page":"93","volume-title":"Proceedings of HLT\/NAACL 09","author":"Tillmann","year":"2009"},{"key":"S135132491100026X_ref29","first-page":"1","article-title":"A block bigram prediction model for statistical machine translation","volume":"4","author":"Tillmann","year":"2007","journal-title":"ACM-TSLP"},{"key":"S135132491100026X_ref32","doi-asserted-by":"publisher","DOI":"10.3115\/1075096.1075106"},{"key":"S135132491100026X_ref25","first-page":"321","volume-title":"Proceedings of the MT Summit XI","author":"Quirk","year":"2007"},{"key":"S135132491100026X_ref12","first-page":"107","article-title":"Bitext maps and alignment via pattern recognition","volume":"25","author":"Melamed","year":"1999","journal-title":"Computational Linguistics"},{"key":"S135132491100026X_ref13","volume-title":"Spanish Gigaword Corpus","author":"Mendonca","year":"2009"},{"key":"S135132491100026X_ref24","first-page":"114","volume-title":"The Comp. Volume of the Proceedings of ACL 04","author":"Pike","year":"2004"},{"key":"S135132491100026X_ref26","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711578"},{"key":"S135132491100026X_ref31","first-page":"225","volume-title":"Proceedings of the ACL-IJCNLP 2009 Conference","author":"Tillmann","year":"2009"},{"key":"S135132491100026X_ref30","first-page":"9","volume-title":"Proceedings of the Workshop CHPSLP at HLT 06","author":"Tillmann","year":"2006"},{"key":"S135132491100026X_ref5","first-page":"1","article-title":"Segmentation and alignment of parallel text for statistical machine translation","volume":"12","author":"Deng","year":"2006","journal-title":"Natural Language Engineering"},{"key":"S135132491100026X_ref15","doi-asserted-by":"publisher","DOI":"10.1162\/089120105775299168"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S135132491100026X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,24]],"date-time":"2019-04-24T20:02:11Z","timestamp":1556136131000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S135132491100026X\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,9,13]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["S135132491100026X"],"URL":"https:\/\/doi.org\/10.1017\/s135132491100026x","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,9,13]]}}}