{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T18:04:20Z","timestamp":1754157860053,"version":"3.41.2"},"reference-count":19,"publisher":"Emerald","issue":"2","license":[{"start":{"date-parts":[[2006,3,1]],"date-time":"2006-03-01T00:00:00Z","timestamp":1141171200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,3,1]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>This paper seeks to propose a new non\u2010intrusive method for the assessment of speech quality of voice communication systems and evaluate its performance.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>The method is based on measuring perception\u2010based objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre\u2010formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into equivalent subjective mean opinion scores (MOSs). The required clustering and matching processes are achieved by an efficient data\u2010mining tool known as the self\u2010organizing map (SOM). The proposed method was examined using a wide range of distortion including speech compression, wireless channel impairments, VoIP channel impairments, and modifications to the signal from features such as AGC.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>The experimental results reported indicate that the proposed method provides a high level of accuracy in predicting the actual subjective quality of the speech. Specifically, the second version of the method, which is based on the use of bark spectrum (BS) analysis, is more accurate in predicting the MOS scores compared with its first and third versions (which are based on BS analysis and mel frequency cepstrum coefficients (MFCC), respectively), and outperforms the ITU\u2010T PESQ in a large number of test cases, particularly those related to distortion caused by channel impairments and signal level modifications.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Research limitations\/implications<\/jats:title><jats:p>It is believed that the prototype developed of the proposed objective speech quality measure is sufficiently accurate and robust against speaker, utterance and distortion type variations. Nevertheless, there are still possible directions for further improvements and enhancement. In general there are three areas that could be pursued for further improvements: widening the coverage of speaker variations of the system's codebook; formulating and using a perceptual speech model that provides true speaker\u2010independent representation of speech; and implementing the proposed measure as a stand\u2010alone system, preferably for real\u2010time applications.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Practical implications<\/jats:title><jats:p>Being an output\u2010based method, the proposed method can be employed for monitoring and assessing telecommunications networks under both live traffic conditions and off\u2010line evaluation.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>The main contribution of this paper is the introduction of a new output\u2010based, non\u2010intrusive method for the assessment of speech quality that is sufficiently accurate and robust. To the best of the author's knowledge, no reliable output\u2010based objective speech quality assessment method has to date been reported or formally recognised.<\/jats:p><\/jats:sec>","DOI":"10.1108\/17410390610645058","type":"journal-article","created":{"date-parts":[[2006,7,3]],"date-time":"2006-07-03T15:24:35Z","timestamp":1151940275000},"page":"148-164","source":"Crossref","is-referenced-by-count":11,"title":["Perceptual non\u2010intrusive speech quality assessment using a self\u2010organizing map"],"prefix":"10.1108","volume":"19","author":[{"given":"Abdulhussain E.","family":"Mahdi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","reference":[{"key":"key2022012920305021900_b1","unstructured":"Beerends, J.G. and Stemerdink, J.A. (1992), \u201cA perceptual audio quality measure based on a psychoacoustic sound representation\u201d, Journal of the Audio Engineering Society, Vol. 40 No. 12, pp. 963\u201074."},{"key":"key2022012920305021900_b2","doi-asserted-by":"crossref","unstructured":"Conway, A.E. (2004), \u201cOutput\u2010based method of applying PESQ to measure the perceptual quality of framed speech signals\u201d, Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC 2004), Atlanta, GA, 21\u201025 March, pp. 2521\u20106.","DOI":"10.1109\/WCNC.2004.1311485"},{"key":"key2022012920305021900_b3","doi-asserted-by":"crossref","unstructured":"Gopalan, K., Anderson, T.R. and Cupples, E.J. (1999), \u201cA comparison of speaker identification results using features based on cepstrum and Fourier\u2010Bessel expansion\u201d, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 7 No. 3, pp. 289\u201094.","DOI":"10.1109\/89.759036"},{"key":"key2022012920305021900_b4","doi-asserted-by":"crossref","unstructured":"Gresho, A. and Gray, R.M. (1992), Vector Quantization and Signal Compression, Kluwer, Norwall, MA.","DOI":"10.1007\/978-1-4615-3626-0"},{"key":"key2022012920305021900_b5","doi-asserted-by":"crossref","unstructured":"Hermansky, H. (1990), \u201cPerceptual linear prediction (PLP) analysis of speech\u201d, Journal of the Acoustical Society of America, Vol. 87 No. 4, pp. 1738\u201053.","DOI":"10.1121\/1.399423"},{"key":"key2022012920305021900_b6","unstructured":"ITU\u2010T (1996a), \u201cMethods for subjective determination of speech quality\u201d, ITU\u2010T Recommendation P.800, Telecommunication Standardization Sector, International Telecommunication Union, Geneva."},{"key":"key2022012920305021900_b7","unstructured":"ITU\u2010T (1996b), \u201cObjective quality measurement of telephone\u2010band (300\u20103,400\u2009Hz) speech codecs\u201d, ITU\u2010T Recommendation P.861, Telecommunication Standardization Sector, International Telecommunication Union, Geneva."},{"key":"key2022012920305021900_b8","unstructured":"ITU\u2010T (2001), \u201cPerceptual evaluation of speech quality (PESQ): an objective method for end\u2010to\u2010end speech quality assessment of narrowband telephone networks and speech codecs\u201d, ITU\u2010T Recommendation P.862, Telecommunication Standardization Sector, International Telecommunication Union, Geneva."},{"key":"key2022012920305021900_b9","unstructured":"Karjalainen, M. (1985), \u201cA new auditory model for the evaluation of sound quality of audio systems\u201d, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 85), Tampa, Florida, 26 March, pp. 608\u201011."},{"key":"key2022012920305021900_b10","doi-asserted-by":"crossref","unstructured":"Kubin, G., Atal, B.S. and Kleijn, W.B. (1993), \u201cPerformance of noise excitation for unvoiced speech\u201d, Proceedings of the IEEE Workshop on Speech Coding for Telecommunications, Sainte\u2010Adele, Canada, 13\u201015 October, pp. 35\u20106.","DOI":"10.1109\/SCFT.1993.762326"},{"key":"key2022012920305021900_b11","doi-asserted-by":"crossref","unstructured":"Picovici, D. and Mahdi, A.E. (2003), \u201cOutput\u2010based objective speech quality measure using self\u2010organizing map\u201d, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, 6\u201010 April, Vol. I, pp. 476\u20109.","DOI":"10.1109\/ICASSP.2003.1198821"},{"key":"key2022012920305021900_b12","unstructured":"Quatieri, T.E. (2002), Discrete\u2010Time Speech Signal Processing: Principles and Practice, Prentice\u2010Hall, Englewood Cliffs, NJ."},{"key":"key2022012920305021900_b13","doi-asserted-by":"crossref","unstructured":"Rafila, K.S. and Dawoud, D.S. (1989), \u201cVoiced\/unvoiced\/mixed excitation classification of speech using the autocorrelation of the output of an ADPCM system\u201d, Proceedings of the IEEE International Conference on Systems Engineering, Fairborn, OH, 24\u201026 August, pp. 537\u201040.","DOI":"10.1109\/ICSYSE.1989.48733"},{"key":"key2022012920305021900_b14","doi-asserted-by":"crossref","unstructured":"Rix, A.W. (2004), \u201cPerceptual speech quality assessment \u2013 a review\u201d, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), Montreal, Canada, 17\u201021 May, Vol. III, pp. 1056\u20109.","DOI":"10.1109\/ICASSP.2004.1326730"},{"key":"key2022012920305021900_b15","unstructured":"Rix, A.W. and Hollier, M.P. (2000), \u201cThe perceptual analysis measurement system for robust end\u2010to\u2010end speech quality assessment\u201d, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2000), Sydney, Australia, 12\u201015 June, Vol. 3, pp. 1515\u201018."},{"key":"key2022012920305021900_b16","doi-asserted-by":"crossref","unstructured":"Schroeder, M.R., Atal, B.S. and Hall, J.L. (1979), \u201cOptimizing digital speech coders by exploiting masking properties of human ear\u201d, Journal of the Acoustical Society of America, Vol. 66 No. 6, pp. 1647\u201052.","DOI":"10.1121\/1.383662"},{"key":"key2022012920305021900_b17","doi-asserted-by":"crossref","unstructured":"Thorpe, L. and Yang, W. (1999), \u201cPerformance of current perceptual objective speech quality measures\u201d, Proceedings of the IEEE Workshop on Speech Coding, Porvoo, Finland, 20\u201023 June, pp. 144\u20106.","DOI":"10.1109\/SCFT.1999.781512"},{"key":"key2022012920305021900_b18","doi-asserted-by":"crossref","unstructured":"Vesanto, J. and Alhonieni, E. (2000), \u201cClustering of the self\u2010organizing map\u201d, IEEE Transactions on Neural Networks, Vol. 11 No. 3, pp. 586\u2010600.","DOI":"10.1109\/72.846731"},{"key":"key2022012920305021900_b19","doi-asserted-by":"crossref","unstructured":"Wang, S., Sekey, A. and Gersho, A. (1992), \u201cAn objective measure for predicting subjective quality of speech coders\u201d, IEEE Journal on Selected Areas in Communications, Vol. 10 No. 5, pp. 819\u201029.","DOI":"10.1109\/49.138987"}],"container-title":["Journal of Enterprise Information Management"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/17410390610645058","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17410390610645058\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17410390610645058\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T00:19:01Z","timestamp":1753402741000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jeim\/article\/19\/2\/148-164\/195835"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3,1]]},"references-count":19,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2006,3,1]]}},"alternative-id":["10.1108\/17410390610645058"],"URL":"https:\/\/doi.org\/10.1108\/17410390610645058","relation":{},"ISSN":["1741-0398"],"issn-type":[{"type":"print","value":"1741-0398"}],"subject":[],"published":{"date-parts":[[2006,3,1]]}}}