{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,25]],"date-time":"2025-02-25T05:28:06Z","timestamp":1740461286896,"version":"3.37.3"},"reference-count":0,"publisher":"IOS Press","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010]]},"abstract":"<jats:p>This paper presents work on collecting comparable corpora for 9 language pairs: Estonian-English, Latvian-English, Lithuanian-English, Greek-English, Greek-Romanian, Croatian-English, Romanian-English, Romanian-German and Slovenian-English. The objective of this work was to gather texts from the same domains and genres and with a similar level of comparability in order to use them as a starting point in defining criteria and metrics of comparability. These criteria and metrics will be applied to comparable texts to determine their suitability for use in Statistical Machine Translation, particularly in the case where translation is performed from or into under-resourced languages for which substantial parallel corpora are unavailable. The size of collected corpora is about 1 million words for each under-resourced language.<\/jats:p>","DOI":"10.3233\/978-1-60750-641-6-161","type":"book-chapter","created":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T11:59:50Z","timestamp":1740398390000},"source":"Crossref","is-referenced-by-count":0,"title":["A Collection of Comparable Corpora for Under-resourced Languages"],"prefix":"10.3233","author":[{"family":"Skadi&ncedil;a Inguna","sequence":"additional","affiliation":[]},{"family":"Aker Ahmet","sequence":"additional","affiliation":[]},{"family":"Giouli Voula","sequence":"additional","affiliation":[]},{"family":"Tufis Dan","sequence":"additional","affiliation":[]},{"family":"Gaizauskas Robert","sequence":"additional","affiliation":[]},{"family":"Mieri&ncedil;a Madara","sequence":"additional","affiliation":[]},{"family":"Mastropavlos Nikos","sequence":"additional","affiliation":[]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","Human Language Technologies &amp;ndash; The Baltic Perspective"],"original-title":[],"deposited":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T12:22:25Z","timestamp":1740399745000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.medra.org\/servlet\/aliasResolver?alias=iospressISSNISBN&issn=0922-6389&volume=219&spage=161"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010]]},"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/978-1-60750-641-6-161","relation":{},"ISSN":["0922-6389"],"issn-type":[{"value":"0922-6389","type":"print"}],"subject":[],"published":{"date-parts":[[2010]]}}}