{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T02:05:12Z","timestamp":1762999512555,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2019,9,28]],"date-time":"2019-09-28T00:00:00Z","timestamp":1569628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This paper proposes a new collaborative and inclusive model for Knowledge Organization Systems (KOS) for sustaining cultural heritage and language diversity. It is based on contributions of end-users as well as scientific and scholarly communities from across borders, languages, nations, continents, and disciplines. It consists in collecting knowledge about all worldwide translations of one original work and sharing that data through a digital and interactive global knowledge map. Collected translations are processed in order to build multilingual parallel corpora for a large number of under-resourced languages as well as to highlight the transnational circulation of knowledge. Building such corpora is vital in preserving and expanding linguistic and traditional diversity. Our first experiment was conducted on the world-famous and well-traveled American novel Adventures of Huckleberry Finn by the American author Mark Twain. This paper reports on 10 parallel corpora that are now sentence-aligned pairs of English with Basque (an European under-resourced language), Bulgarian, Dutch, Finnish, German, Hungarian, Polish, Portuguese, Russian, and Ukrainian, processed out of 30 collected translations.<\/jats:p>","DOI":"10.3390\/info10100303","type":"journal-article","created":{"date-parts":[[2019,9,30]],"date-time":"2019-09-30T05:58:33Z","timestamp":1569823113000},"page":"303","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8693-8862","authenticated-orcid":false,"given":"Amel","family":"Fraisse","sequence":"first","affiliation":[{"name":"Groupe d\u2019\u00c9tudes et de Recherche Interdisciplinaire en Information et Communication (GERiiCO), Universit\u00e9 de Lille, 59000 Lille, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Laboratoire d\u2019Informatique pour la M\u00e9canique et les Sciences de l\u2019Ing\u00e9nieur-Centre National de la Recherche Scientifique (LIMSI-CNRS), Universit\u00e9 Paris-Saclay, 91400 Orsay, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alex","family":"Zhai","sequence":"additional","affiliation":[{"name":"Laboratoire d\u2019Informatique pour la M\u00e9canique et les Sciences de l\u2019Ing\u00e9nieur-Centre National de la Recherche Scientifique (LIMSI-CNRS), Universit\u00e9 Paris-Saclay, 91400 Orsay, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ronald","family":"Jenn","sequence":"additional","affiliation":[{"name":"Centre d\u2019Etudes en Civilisations, Langues et Litt\u00e9ratures Etrang\u00e8res (CECILLE), Universit\u00e9 de Lille, 59000 Lille, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shelley","family":"Fisher Fishkin","sequence":"additional","affiliation":[{"name":"Department of English, Stanford University, 94305 California, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8410-4808","authenticated-orcid":false,"given":"Pierre","family":"Zweigenbaum","sequence":"additional","affiliation":[{"name":"Laboratoire d\u2019Informatique pour la M\u00e9canique et les Sciences de l\u2019Ing\u00e9nieur-Centre National de la Recherche Scientifique (LIMSI-CNRS), Universit\u00e9 Paris-Saclay, 91400 Orsay, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Laurence","family":"Favier","sequence":"additional","affiliation":[{"name":"Groupe d\u2019\u00c9tudes et de Recherche Interdisciplinaire en Information et Communication (GERiiCO), Universit\u00e9 de Lille, 59000 Lille, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Widad","family":"Mustafa El Hadi","sequence":"additional","affiliation":[{"name":"Groupe d\u2019\u00c9tudes et de Recherche Interdisciplinaire en Information et Communication (GERiiCO), Universit\u00e9 de Lille, 59000 Lille, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,9,28]]},"reference":[{"key":"ref_1","unstructured":"Otlet, P. (1934). Trait\u00e9 de Documentation: Le livre sur le Livre: Th\u00e9orie et Pratique, Mundaneum."},{"key":"ref_2","unstructured":"Krauwer, S. (2003, January 27\u201329). The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap. Proceedings of the International Workshop Speech and Computer, Moscow, Russia."},{"key":"ref_3","unstructured":"Arppe, A., Lachler, J., Trosterud, T., Antonsen, L., and Moshagen, S.N. (2016, January 23). Basic Language Resource Kits for Endangered Languages: A Case Study of Plains Cree. Proceedings of the 2nd Workshop on Collaboration and Computing for Under-Resourced Languages Workshop (CCURL 2016), Portoro\u017e, Slovenia."},{"key":"ref_4","unstructured":"Scannell, K. (2007). The Crubadan Project: Corpus building for under-resourced languages. Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, University Press Leuven."},{"key":"ref_5","unstructured":"Fraisse, A., Boitet, C., Blanchon, H., and Bellynck, V. (2009, January 6\u20138). A Solution for in Context and Collaborative Localization of most Commercial and Free Software. Proceedings of the 4th Language and Technology Conference (LTC 2009), Pozna\u0144, Poland."},{"key":"ref_6","unstructured":"Fraisse, A., Boitet, C., and Bellynck, V. (2012, January 8\u201315). An In Context and Collaborative Software Localisation Model. Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India."},{"key":"ref_7","unstructured":"Roukos, S., Graff, D., and Melamed, D. (1995). Hansard French\/English, Linguistic Data Consortium."},{"key":"ref_8","unstructured":"Koehn, P. (2005, January 12\u201316). Europarl: A Parallel Corpus for Statistical Machine Translation. Proceedings of the Tenth Machine Translation Summit, Phuket, Thailand."},{"key":"ref_9","unstructured":"Ziemski, M., Junczys-Dowmunt, M., and Pouliquen, B. (2016, January 23\u201328). The United Nations Parallel Corpus V1.0. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portoro\u017e, Slovenia."},{"key":"ref_10","unstructured":"Otlet, P. (1935). Monde: Essai d\u2019universalisme: Connaissance du Monde, Sentiment du Monde, Action Organis\u00e9e et Plan du Monde, Mundaneum."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1080\/00049670.1992.10755606","article-title":"The legacy of Paul Otlet, pioneer of information science","volume":"41","author":"Rayward","year":"1992","journal-title":"Aust. Libr. J."},{"key":"ref_12","first-page":"11","article-title":"Multilingual Thesaurus Construction-Integrating the Views of Different Cultures in One Gateway to Knowledge and Concepts","volume":"17","author":"Hudon","year":"1997","journal-title":"Inf. Serv. Use"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"156","DOI":"10.3828\/indexer.1999.21.4.4","article-title":"Accessing Documents and Information in a World without Frontiers","volume":"21","author":"Hudon","year":"1999","journal-title":"Index"},{"key":"ref_14","first-page":"91","article-title":"Knowledge Organization in the Cross-Cultural and Multicultural Society","volume":"11","year":"2008","journal-title":"Adv. Knowl. Org."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1023\/A:1001798929185","article-title":"The Bible as a Parallel Corpus: Annotating the \u2018Book of 2000 Tongues\u2019","volume":"33","author":"Resnik","year":"1999","journal-title":"Comput. Humanit."},{"key":"ref_16","unstructured":"Mayer, T., and Cysouw, M. (2014, January 26\u201331). Creating a Massively Parallel Bible Corpus. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1007\/s10579-014-9287-y","article-title":"A massively parallel corpus: The Bible in 100 languages","volume":"49","author":"Christodouloupoulos","year":"2015","journal-title":"Lang. Resour. Eval."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Choudhary, N., and Jha, G.N. (2014). Creating Multilingual Parallel Corpora in Indian Languages. Human Language Technology Challenges for Computer Science and Linguistics, Springer International Publishing.","DOI":"10.1007\/978-3-319-14120-6_43"},{"key":"ref_19","unstructured":"Jha, G.N. (2010, January 17\u201323). The TDIL Program and the Indian Language Corpora Initiative (ILCI). Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta."},{"key":"ref_20","unstructured":"Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufi\u015f, D., and Varga, D. (2006, January 24\u201326). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. Proceedings of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy."},{"key":"ref_21","first-page":"95","article-title":"Parallel texts: Using translational equivalents in linguistic typology","volume":"60","author":"Cysouw","year":"2007","journal-title":"Sprachtypol. Univers. STUF"},{"key":"ref_22","unstructured":"Druskat, S., Gast, V., Krause, T., and Zipser, F. (2016, January 23\u201328). corpus-tools. org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portoro\u017e, Slovenia."},{"key":"ref_23","unstructured":"Gilmanov, T., Scrivner, O., and K\u00fcbler, S. (2014, January 26\u201331). SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland."},{"key":"ref_24","unstructured":"Smith, N., and Jahr, M. (June, January 30). Cairo: An Alignment Visualization Tool. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece."},{"key":"ref_25","unstructured":"Gomes, F.T., Pardo, T.A., and de Medeiros Caseli, H. (2007, January 5\u20136). Visualtca: Uma ferramenta visual on-line para alinhamento sentencial de textos paralelos. Proceedings of the Anais do XXVII Congresso da Sociedade Brasileira de Computa\u00e7\u00e3o-V Workshop em Tecnologia da Informa\u00e7\u00e3o e da Linguagem Humana (TIL), Rio de Janeiro."},{"key":"ref_26","unstructured":"Fleury, S., and Zimina, M. (2019, September 20). Exploring Translation Corpora with MkAlign. Available online: https:\/\/www.researchgate.net\/profile\/Maria_Zimina5\/publication\/49135660_Exploring_Translation_Corpora_with_MkAlign\/links\/5baa93ab299bf13e604c87eb\/Exploring-Translation-Corpora-with-MkAlign.pdf."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"429","DOI":"10.3390\/fi5030429","article-title":"Libraries\u2019 Role in Curating and Exposing Big Data","volume":"5","author":"Teets","year":"2013","journal-title":"Future Int."},{"key":"ref_28","unstructured":"Cassin, B., and Ducimeti\u00e8re, N. (2017). Les Routes de la traduction. Babel \u00e0 Gen\u00e8ve, Gallimard."},{"key":"ref_29","unstructured":"Fishkin, S.F. (2019, September 20). DEEP MAPS: A Brief for Digital Palimpsest Mapping Projects (DPMPs) or \u2018Deep Maps\u2019. Available online: https:\/\/escholarship.org\/uc\/item\/92v100t0."},{"key":"ref_30","unstructured":"Rodney, R.M. (1982). Mark Twain International: A Bibliography and Interpretation of His Wordwide Popularity, Greenwood Press."},{"key":"ref_31","unstructured":"Charles, L. (1885). Adventures of Huckleberry Finn, Webster and Company."},{"key":"ref_32","unstructured":"Fraisse, A., Jenn, R., and Fishkin, S.F. (2018, January 7\u201312). Parallel Corpora for Under-Resourced Languages Using Translated Fictional Texts. Proceedings of the LREC 2018 Workshop CCURL2018\u2014Sustaining Knowledge Diversity in the Digital Age, Miyazaki, Japan."},{"key":"ref_33","first-page":"75","article-title":"A Program for Aligning Sentences in Bilingual Corpora","volume":"19","author":"Gale","year":"1993","journal-title":"Comput. Linguist."},{"key":"ref_34","first-page":"263","article-title":"The Mathematics of Statistical Machine Translation: Parameter Estimation","volume":"19","author":"Brown","year":"1993","journal-title":"Comput. Linguist."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/10\/303\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:25:37Z","timestamp":1760189137000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/10\/303"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,28]]},"references-count":34,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2019,10]]}},"alternative-id":["info10100303"],"URL":"https:\/\/doi.org\/10.3390\/info10100303","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2019,9,28]]}}}