{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T16:46:04Z","timestamp":1765039564180,"version":"3.37.3"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,11,22]],"date-time":"2020-11-22T00:00:00Z","timestamp":1606003200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,11,22]],"date-time":"2020-11-22T00:00:00Z","timestamp":1606003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006764","name":"Technische Universit\u00e4t Berlin","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006764","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Qual User Exp"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Subjective speech quality assessment has traditionally been carried out in laboratory environments under controlled conditions. With the advent of crowdsourcing platforms tasks, which need human intelligence, can be resolved by crowd workers over the Internet. Crowdsourcing also offers a new paradigm for speech quality assessment, promising higher ecological validity of the quality judgments at the expense of potentially lower reliability. This paper compares laboratory-based and crowdsourcing-based speech quality assessments in terms of comparability of results and efficiency. For this purpose, three pairs of listening-only tests have been carried out using three different crowdsourcing platforms and following the ITU-T Recommendation P.808. In each test, listeners judge the overall quality of the speech sample following the Absolute Category Rating procedure. We compare the results of the crowdsourcing approach with the results of standard laboratory tests performed according to the ITU-T Recommendation P.800. Results show that in most cases, both paradigms lead to comparable results. Notable differences are discussed with respect to their sources, and conclusions are drawn that establish practical guidelines for crowdsourcing-based speech quality assessment.<\/jats:p>","DOI":"10.1007\/s41233-020-00042-1","type":"journal-article","created":{"date-parts":[[2020,11,22]],"date-time":"2020-11-22T09:10:17Z","timestamp":1606036217000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods"],"prefix":"10.1007","volume":"6","author":[{"given":"Babak","family":"Naderi","sequence":"first","affiliation":[]},{"given":"Rafael","family":"Zequeira Jim\u00e9nez","sequence":"additional","affiliation":[]},{"given":"Matthias","family":"Hirth","sequence":"additional","affiliation":[]},{"given":"Sebastian","family":"M\u00f6ller","sequence":"additional","affiliation":[]},{"given":"Florian","family":"Metzger","sequence":"additional","affiliation":[]},{"given":"Tobias","family":"Ho\u00dffeld","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,11,22]]},"reference":[{"issue":"2","key":"42_CR1","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1109\/TMM.2013.2291663","volume":"16","author":"T Ho\u00dffeld","year":"2014","unstructured":"Ho\u00dffeld T, Keimel C, Hirth M, Gardlo B, Habigt J, Diepold K, Tran-Gia P (2014) Best practices for QoE crowdtesting: QoE assessment with crowdsourcing. IEEE Trans Multimed 16(2):541\u2013558","journal-title":"IEEE Trans Multimed"},{"key":"42_CR2","unstructured":"ITU-T Recommendation P.800 (1996) Methods for subjective determination of transmission quality.\u00a0International Telecommunication Union, Geneva"},{"key":"42_CR3","unstructured":"ITU-T Recommendation P.808 (2018) Subjective evaluation of speech quality with a crowdsourcing approach.\u00a0International Telecommunication Union, Geneva"},{"key":"42_CR4","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1007\/978-3-319-66435-4_7","volume-title":"Evaluation in the crowd. Crowdsourcing and human-centered experiments","author":"S Egger-Lampl","year":"2017","unstructured":"Egger-Lampl S, Redi J, Ho\u00dffeld T, Hirth M, M\u00f6ller S, Naderi B, Keimel C, Saupe D (2017) Crowdsourcing quality of experience experiments. In: Archambault D, Purchase H, Ho\u00dffeld T (eds) Evaluation in the crowd. Crowdsourcing and human-centered experiments. Springer, Cham, pp 154\u2013190"},{"key":"42_CR5","doi-asserted-by":"crossref","unstructured":"Hosu V, Lin H,\u00a0Saupe D (2018) Expertise screening in crowdsourcing image quality. In: 2018 Tenth international conference on quality of multimedia experience (QoMEX), pp 1\u20136","DOI":"10.1109\/QoMEX.2018.8463427"},{"issue":"7","key":"42_CR6","doi-asserted-by":"publisher","first-page":"1338","DOI":"10.1109\/TMM.2016.2559942","volume":"18","author":"E Siahaan","year":"2016","unstructured":"Siahaan E, Hanjalic A, Redi J (2016) A Reliable Methodology to Collect Ground Truth Data of Image Aesthetic Appeal. IEEE Trans Multim 18(7):1338\u20131350","journal-title":"IEEE Trans Multim"},{"key":"42_CR7","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-016-3948-3","author":"J S\u00f8gaard","year":"2016","unstructured":"S\u00f8gaard J, Shahid M, Pokhrel J, Brunnstr\u00f6m K (2016) On subjective quality assessment of adaptive video streaming via crowdsourcing and laboratory based experiments. Multim Tools Appl. https:\/\/doi.org\/10.1007\/s11042-016-3948-3","journal-title":"Multim Tools Appl"},{"key":"42_CR8","doi-asserted-by":"crossref","unstructured":"Cartwright M, \u00a0Pardo B, Mysore GJ, Hoffman M (2016) Fast and easy crowd sourced perceptual audio evaluation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 619\u2013623","DOI":"10.1109\/ICASSP.2016.7471749"},{"key":"42_CR9","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1016\/j.comnet.2015.05.021","volume":"90","author":"T Volk","year":"2015","unstructured":"Volk T, Keimel C, Moosmeier M, Diepold K (2015) Crowdsourcing vs. laboratory experiments - QoE evaluation of binaural playback in a teleconference scenario. Comput Netw 90:99\u2013109","journal-title":"Comput Netw"},{"key":"42_CR10","doi-asserted-by":"crossref","unstructured":"Naderi B, Polzehl T, Wechsung I, K\u00f6ster F, M\u00f6ller S (2015) Effect of trapping questions on the reliability of speech quality judgments in a crowd sourcing paradigm. In: INTERSPEECH. ISCA 2799-2803","DOI":"10.21437\/Interspeech.2015-589"},{"key":"42_CR11","doi-asserted-by":"crossref","unstructured":"Zequeira Jim\u00e9nez R, Fern\u00e1ndez Gallardo L, M\u00f6ller S (2018) Influence of number of stimuli for subjective speech quality assessment in crowdsourcing. In: 2018 Tenth international conference on quality of multimedia experience (QoMEX), pp 1\u20136","DOI":"10.1109\/QoMEX.2018.8463298"},{"key":"42_CR12","doi-asserted-by":"crossref","unstructured":"Polzehl T, \u00a0Naderi B, K\u00f6ster F, M\u00f6ller S (2015) Robustness in speech quality assessment and temporal training expiry in mobile crowdsourcing environments. In: Sixteenth annual conference of the international speech communication association","DOI":"10.21437\/Interspeech.2015-588"},{"key":"42_CR13","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/978-3-319-66435-4_2","volume-title":"Evaluation in the Crowd. Crowdsourcing and human-centered experiments","author":"U Gadiraju","year":"2017","unstructured":"Gadiraju U, M\u00f6ller S, N\u00f6llenburg M, Saupe D, Egger-Lampl S, Archambault D, Fisher B (2017) Crowdsourcing versus the laboratory: towards human-centered experiments using the crowd. In: Archambault D, Purchase H, Ho\u00dffeld T (eds) Evaluation in the Crowd. Crowdsourcing and human-centered experiments. Springer, Cham, pp 6\u201326"},{"issue":"2","key":"42_CR14","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1109\/MNET.2010.5430141","volume":"24","author":"K-T Chen","year":"2010","unstructured":"Chen K-T, Chang C-J, Wu C-C, Chang Y-C, Lei C-L (2010) Quadrant of Euphoria: a crowdsourcing platform for QoE assessment. IEEE Network 24(2):28\u201335","journal-title":"IEEE Network"},{"key":"42_CR15","unstructured":"ITU-R Recommendation BT.500-11 (2002) Methodology for the subjective assessment of the quality of television pictures.\u00a0International Telecommunication Union, Geneva"},{"key":"42_CR16","unstructured":"ITU-R Recommendation P.910 (2008) Subjective video quality assessment methods for multimedia applications.\u00a0International Telecommunication Union, Geneva"},{"key":"42_CR17","doi-asserted-by":"crossref","unstructured":"Ribeiro FP, Flor\u00eancio DAF, Zhang C, Seltzer ML (2011) \u2018CROWDMOS: an approach for crowdsourcing mean opinion score studies. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2416\u20132419","DOI":"10.1109\/ICASSP.2011.5946971"},{"key":"42_CR18","unstructured":"ITU-R Recommendation BS.1534-3 (2014) Method for the subjective assessment of intermediate quality level of audio systems.\u00a0International Telecommunication Union, Geneva"},{"key":"42_CR19","doi-asserted-by":"crossref","unstructured":"Ribeiro F,\u00a0Florencio D,\u00a0Nascimento V (2011) Crowdsourcing subjective image quality evaluation. In: 18th IEEE international conference on image processing, pp 3097\u20133100","DOI":"10.1109\/ICIP.2011.6116320"},{"key":"42_CR20","unstructured":"Sheikh H, Wang Z, Cormack L, Bovik A (2003) Live image quality assessment database, 2003. [Online]. Available http:\/\/live.ece.utexas.edu\/research\/quality\/"},{"key":"42_CR21","doi-asserted-by":"crossref","unstructured":"Keimel C, Habigt J, Horch C, Diepold K (2012) QualityCrowd - a framework for crowd-based quality evaluation. In: 2012 Picture coding symposium, pp 245\u2013248","DOI":"10.1109\/PCS.2012.6213338"},{"key":"42_CR22","doi-asserted-by":"crossref","unstructured":"Ruchaud N,\u00a0Antipov G,\u00a0Korshunov P, Dugelay J-L,\u00a0Ebrahimi T, Berrani S-A (2015) The impact of privacy protection filters on gender recognition. In: Tescher AG (Ed) Applications of digital image processing XXXVIII, vol 9599. International Society for Optics and Photonics.\u00a0SPIE, pp 36\u201347","DOI":"10.1117\/12.2193647"},{"key":"42_CR23","doi-asserted-by":"crossref","unstructured":"Korshunov P, Bernardo MV, Pinheiro AM,\u00a0Ebrahimi T (2015) Impact of tone-mapping algorithms on subjective and objective face recognition in hdr images. In: Proceedings of the fourth international workshop on crowdsourcing for multimedia, ser. CrowdMM\u201915. Association for Computing Machinery, New York, NY, pp 39\u201344","DOI":"10.1145\/2810188.2810195"},{"key":"42_CR24","doi-asserted-by":"crossref","unstructured":"Bonetto M, Korshunov P, Ramponi G,\u00a0Ebrahimi T (2015) Privacy in mini-drone based video surveillance. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol\u00a004, pp 1\u20136","DOI":"10.1109\/FG.2015.7285023"},{"key":"42_CR25","unstructured":"Saupe D,\u00a0Hahn F,\u00a0Hosu V,\u00a0Zingman I, Rana M,\u00a0Li S (2016) Crowd workers proven useful: a comparative study of subjective video quality assessment. In: 8th International conference on quality of multimedia experience (QoMEX)"},{"key":"42_CR26","doi-asserted-by":"crossref","unstructured":"Ho\u00dffeld T, Seufert M, \u00a0Sieber C,\u00a0Zinner T (2014) Assessing effect sizes of influence factors towards a qoe model for http adaptive streaming. In: 2014 Sixth international workshop on quality of multimedia experience (QoMEX), pp 111\u2013116","DOI":"10.1109\/QoMEX.2014.6982305"},{"key":"42_CR27","volume-title":"\u201cBeaqleJS: HTML5 and JavaScript based framework for the subjective evaluation of audio quality,\u201d in Linux Audio Conference","author":"S Kraft","year":"2014","unstructured":"Kraft S, Z\u00f6lzer U (2014) \u201cBeaqleJS: HTML5 and JavaScript based framework for the subjective evaluation of audio quality,\u201d in Linux Audio Conference. Karlsruhe, DE"},{"key":"42_CR28","volume-title":"Practical procedures for subjective testing","author":"ITU-T Handbook","year":"2011","unstructured":"Handbook ITU-T (2011) Practical procedures for subjective testing. International Telecommunication Union, Geneva"},{"key":"42_CR29","doi-asserted-by":"crossref","unstructured":"Naderi B, Polzehl T, Wechsung I, K\u00f6ster F, M\u00f6ller S (2015) Effect of trapping questions on the reliability of speech quality judgments in a crowdsourcing paradigm. In: Sixteenth annual conference of the international speech communication association","DOI":"10.21437\/Interspeech.2015-589"},{"key":"42_CR30","unstructured":"ITU-T Recommendation P.863 (2018) Perceptual Objective listening quality prediction.\u00a0 International Telecommunication Union, Geneva"},{"key":"42_CR31","doi-asserted-by":"crossref","unstructured":"Martin D, Carpendale S, Gupta N, Ho\u00dffeld T,\u00a0Naderi B,\u00a0Redi J, Siahaan E, \u00a0Wechsung I (2017) Understanding the crowd: ethical and practical matters in the academic use of crowdsourcing. In: Evaluation in the crowd. Crowdsourcing and human-centered experiments.\u00a0\u00a0Springer, New York, pp 27\u201369","DOI":"10.1007\/978-3-319-66435-4_3"},{"issue":"1","key":"42_CR32","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1080\/14992020400050004","volume":"43","author":"C Smits","year":"2004","unstructured":"Smits C, Kapteyn TS, Houtgast T (2004) Development and validation of an automatic speech-in-noise screening test by telephone. Int J Audiol 43(1):15\u201328","journal-title":"Int J Audiol"},{"issue":"1","key":"42_CR33","first-page":"6","volume":"54","author":"M Buscherm\u00f6hle","year":"2015","unstructured":"Buscherm\u00f6hle M, Wagener K, Berg D, Meis M, Kollmeier B (2015) The german digit triplets test (part ii): validation and pass\/fail criteria. Zeitschrift f\u00fcr Audiologie 54(1):6\u201313","journal-title":"Zeitschrift f\u00fcr Audiologie"},{"key":"42_CR34","doi-asserted-by":"crossref","unstructured":"Naderi B, M\u00f6ller S (2020) Application of just-noticeable difference in quality as environment suitability test for crowdsourcing speech quality assessment task. In: 12th International conference on quality of multimedia experience (QoMEX).\u00a0IEEE, pp 1\u20136","DOI":"10.1109\/QoMEX48832.2020.9123093"},{"key":"42_CR35","doi-asserted-by":"crossref","unstructured":"Zequeira Jim\u00e9nez R, Mittag G, M\u00f6ller S (2018) Effect of number of stimuli on users perception of different speech degradations. A crowdsourcing case study. In: IEEE international symposium on multimedia (ISM). IEEE, pp 175\u2013179","DOI":"10.1109\/ISM.2018.00-16"},{"key":"42_CR36","unstructured":"ITU-T Recommendation P.1401 (2020) Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. International Telecommunication Union, Geneva"},{"key":"42_CR37","unstructured":"Ho\u00dffeldT, Heegaard PE, Varela M, \u00a0Skorin-Kapov L (2018) Confidence interval estimators for mos values. arXiv preprint arXiv:1806.01126"},{"key":"42_CR38","doi-asserted-by":"crossref","unstructured":"Naderi B, M\u00f6ller S (2020) Transformation of mean opinion scores to avoid misleading of ranked based statistical techniques. In: 12th International conference on quality of multimedia experience (QoMEX).\u00a0IEEE, pp 1\u20133","DOI":"10.1109\/QoMEX48832.2020.9123078"},{"issue":"1","key":"42_CR39","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1007\/s41233-016-0002-1","volume":"1","author":"T Ho\u00dffeld","year":"2016","unstructured":"Ho\u00dffeld T, Heegaard PE, Varela M, M\u00f6ller S (2016) Qoe beyond the mos: an in-depth look at qoe via better metrics and their relation to mos. Quality User Exp 1(1):2","journal-title":"Quality User Exp"},{"key":"42_CR40","doi-asserted-by":"crossref","unstructured":"Naderi B, Hossfeld T, Hirth M, Metzger F, M\u00f6ller S, \u00a0Zequeira Jim\u00e9nez R (2020) Impact of the number of votes on the reliability and validity of subjective speech quality assessment in the crowdsourcing approach. In: 12th international conference on quality of multimedia experience (QoMEX).\u00a0\u00a0IEEE, pp 1\u20136","DOI":"10.1109\/QoMEX48832.2020.9123115"},{"key":"42_CR41","unstructured":"Ho$$\\beta$$feld T, Schatz R, Egger S (2011) Sos: The mos is not enough! In: Third international workshop on quality of multimedia experience. IEEE, pp 131\u2013136"},{"key":"42_CR42","doi-asserted-by":"crossref","unstructured":"Zequeira Jim\u00e9nez R, \u00a0Naderi B, M\u00f6ller S (2020) Effect of environmental noise in speech quality assessment studies using crowdsourcing. In: 12th International conference on quality of multimedia experience (QoMEX).\u00a0\u00a0IEEE, pp 1\u20136","DOI":"10.1109\/QoMEX48832.2020.9123144"},{"key":"42_CR43","doi-asserted-by":"crossref","unstructured":"Naderi B,\u00a0Cutler R (2020) An open source implementation of itu-t recommendation p.808 with validation. To appear in INTERSPEECH.\u00a0\u00a0ISCA","DOI":"10.21437\/Interspeech.2020-2665"}],"container-title":["Quality and User Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41233-020-00042-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41233-020-00042-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41233-020-00042-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,17]],"date-time":"2021-11-17T14:44:53Z","timestamp":1637160293000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41233-020-00042-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,22]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["42"],"URL":"https:\/\/doi.org\/10.1007\/s41233-020-00042-1","relation":{},"ISSN":["2366-0139","2366-0147"],"issn-type":[{"type":"print","value":"2366-0139"},{"type":"electronic","value":"2366-0147"}],"subject":[],"published":{"date-parts":[[2020,11,22]]},"assertion":[{"value":"25 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 November 2020","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of Interest"}}],"article-number":"2"}}