{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:55:26Z","timestamp":1760237726791,"version":"build-2065373602"},"reference-count":30,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,6,11]],"date-time":"2020-06-11T00:00:00Z","timestamp":1591833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["01UG1620G"],"award-info":[{"award-number":["01UG1620G"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Publications"],"abstract":"<jats:p>This article describes the development of the digital infrastructure at a research data centre for audio-visual linguistic research data, the Hamburg Centre for Language Corpora (HZSK) at the University of Hamburg in Germany, over the past ten years. The typical resource hosted in the HZSK Repository, the core component of the infrastructure, is a collection of recordings with time-aligned transcripts and additional contextual data, a spoken language corpus. Since the centre has a thematic focus on multilingualism and linguistic diversity and provides its service to researchers within linguistics and other disciplines, the development of the infrastructure was driven by diverse usage scenarios and user needs on the one hand, and by the common technical requirements for certified service centres of the CLARIN infrastructure on the other. Beyond the technical details, the article also aims to be a contribution to the discussion on responsibilities and services within emerging digital research data infrastructures and the fundamental issues in sustainability of research software engineering, concluding that in order to truly cater to user needs across the research data lifecycle, we still need to bridge the gap between discipline-specific research methods in the process of digitalisation and generic digital research data management approaches.<\/jats:p>","DOI":"10.3390\/publications8020033","type":"journal-article","created":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T05:02:24Z","timestamp":1591938144000},"page":"33","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Providing Digital Infrastructure for Audio-Visual Linguistic Research Data with Diverse Usage Scenarios: Lessons Learnt"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4655-6082","authenticated-orcid":false,"given":"Hanna","family":"Hedeland","sequence":"first","affiliation":[{"name":"Hamburg Centre for Language Corpora\/CLARIN-D, Universit\u00e4t Hamburg, 22765 Hamburg, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2020,6,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Durand, J., Gut, U., and Kristoffersen, G. (2014). EXMARaLDA. Handbook on Corpus Phonology, Oxford University Press.","DOI":"10.1093\/oxfordhb\/9780199571932.001.0001"},{"key":"ref_2","unstructured":"Hedeland, H., Schmidt, T., and W\u00f6rner, K. (2011). Multilingual Corpora at the Hamburg Centre for Language Corpora. Multilingual Resources and Multilingual Applications, Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011, Universit\u00e4t Hamburg."},{"key":"ref_3","unstructured":"Wittenburg, P., van Uytvanck, D., Zastrow, T., Stra\u0148\u00e1k, P., Broeder, D., Schiel, F., Boehlke, V., Reichel, U., and Offersgaard, L. (2019). CLARIN B Centre Checklist (CE-2013-0095), CLARIN ERIC. Technical Report."},{"key":"ref_4","unstructured":"Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., and Odijk, J. (2016). The BAS Speech Data Repository. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA)."},{"key":"ref_5","unstructured":"Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., and Piperidis, S. (2012). The Language Archive\u2014A new hub for language resources. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA)."},{"key":"ref_6","unstructured":"Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., and Odijk, J. (2016). FLAT: Constructing a CLARIN Compatible Home for Language Resources. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA)."},{"key":"ref_7","unstructured":"Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (2014). The database for spoken German\u2014DGD2. Proceedings of the Ninth Conference on International Language Resources and Evaluation (LREC 2014), European Language Resources Association (ELRA)."},{"key":"ref_8","unstructured":"Lehmberg, T. (2015). Wissenstransfer und Wissensressourcen: Support und Helpdesk in den Digital Humanities. Forschungsdaten in den Geisteswissenschaften (FORGE 2015). Programm und Abstracts, Universit\u00e4t Hamburg."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sambale, H., Hedeland, H., and Pirinen, T. User Support for the Digital Humanities. Selected Papers from the CLARIN Annual Conference 2019, Link\u00f6ping University Electronic Press, Link\u00f6pings Universitet. to appear.","DOI":"10.3384\/ecp2020172014"},{"key":"ref_10","unstructured":"Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., and Mazo, H. (2018). Introducing the CLARIN knowledge centre for linguistic diversity and language documentation. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European language Resources Association (ELRA)."},{"key":"ref_11","unstructured":"Hedeland, H., and Ferger, A. Towards Continuous Quality Control for Spoken Language Corpora. International Journal for Digital Curation, University of Edinburgh. to appear."},{"key":"ref_12","unstructured":"Ochs, E., and Schieffelin, B. (1979). Transcription as theory. Developmental Pragmatics, Academic Press."},{"key":"ref_13","unstructured":"Schiffrin, D., Tannen, D., and Hamilton, H. (2001). The Transcription of Discourse. The Handbook of Discourse Analysis, Blackwell."},{"key":"ref_14","unstructured":"Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., and Odijk, J. (2016). User, who art thou? User profiling for oral corpus platforms. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA)."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Meyer, B., and Apfelbaum, B. (2010). Nurses as interpreters. Aspects of interpreter training for bilingual medical employees. Multilingualism at Work. From Policies to Practices in Public, Medical, and Business Settings, Benjamins.","DOI":"10.1075\/hsm.9.09mey"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Jettka, D., and Stein, D. (2014). The HZSK Repository: Implementation, Features, and Use Cases of a Repository for Spoken Language Corpora. D-Lib Mag., 20.","DOI":"10.1045\/september2014-jettka"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for Scientific Data Management and Stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"},{"key":"ref_18","unstructured":"Pirinen, T., Jettka, D., and Hedeland, H. (2017). Developing a CLARIN compatible AAI solution for academic and restricted resources. CLARIN Annual Conference 2017 Book of Abstracts, CLARIN ERIC."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Durand, J., Gut, U., and Kristoffersen, G. (2014). ELAN: Multimedia annotation application. Handbook on Corpus Phonology, Oxford University Press.","DOI":"10.1093\/oxfordhb\/9780199571932.001.0001"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/S0167-6393(00)00067-4","article-title":"Transcriber: Development and use of a tool for assisting speech corpora production","volume":"33","author":"Barras","year":"2000","journal-title":"Speech Commun."},{"key":"ref_21","first-page":"341","article-title":"Praat, a system for doing phonetics by computer","volume":"5","author":"Boersma","year":"2001","journal-title":"Glot Int."},{"key":"ref_22","unstructured":"MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk, Volume I, Lawrence Erlbaum. [3rd ed.]."},{"key":"ref_23","unstructured":"ISO\/TC 37\/SC 4 (2016). Language Resource Management\u2014Transcription of Spoken Language, International Organization for Standardization. Standard ISO 2462:2016."},{"key":"ref_24","unstructured":"TEI Consortium (2016). TEI P5: Guidelines for Electronic Text Encoding and Interchange, TEI Consortium. Technical Report, Version 3.1.0, 2016-12-15."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/llc\/fqu057","article-title":"ANNIS3: A new architecture for generic corpus query and visualization","volume":"31","author":"Krause","year":"2016","journal-title":"Digit. Scholarsh. Humanit."},{"key":"ref_26","unstructured":"Yimam, S.M., Gurevych, I., Eckart de Castilho, R., and Biemann, C. (2013). WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics (ACL)."},{"key":"ref_27","unstructured":"Hinrichs, E., Hinrichs, M., and Zastrow, T. (2010). WebLicht: Web-Based LRT Services for German. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics (ACL)."},{"key":"ref_28","unstructured":"Schmidt, T., Hedeland, H., and Jettka, D. (2017). Conversion and Annotation Web Services for Spoken Language Data in CLARIN. Selected Papers from the CLARIN Annual Conference 2016, Link\u00f6ping University Electronic Press, Link\u00f6pings Universitet."},{"key":"ref_29","unstructured":"Remus, S., Hedeland, H., Ferger, A., B\u00fchrig, K., and Biemann, C. (2019). WebAnno-MM: EXMARaLDA meetsWebAnno. Selected Papers from the CLARIN Annual Conference 2018, Link\u00f6ping University Electronic Press, Link\u00f6pings Universitet."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Arkhangelskiy, T., Ferger, A., and Hedeland, H. (2019). Uralic multimedia corpora: ISO\/TEI corpus data in the project INEL. Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W19-0310"}],"container-title":["Publications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2304-6775\/8\/2\/33\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:37:40Z","timestamp":1760175460000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2304-6775\/8\/2\/33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,11]]},"references-count":30,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["publications8020033"],"URL":"https:\/\/doi.org\/10.3390\/publications8020033","relation":{},"ISSN":["2304-6775"],"issn-type":[{"type":"electronic","value":"2304-6775"}],"subject":[],"published":{"date-parts":[[2020,6,11]]}}}