{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T21:43:07Z","timestamp":1775079787244,"version":"3.50.1"},"reference-count":38,"publisher":"Acoustical Society of America (ASA)","issue":"4","content-domain":{"domain":["pubs.aip.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2007,10,1]]},"abstract":"<jats:p>This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter.<\/jats:p>","DOI":"10.1121\/1.2772228","type":"journal-article","created":{"date-parts":[[2007,9,26]],"date-time":"2007-09-26T22:21:15Z","timestamp":1190845275000},"page":"2389-2404","update-policy":"https:\/\/doi.org\/10.1063\/aip-crossmark-policy-page","source":"Crossref","is-referenced-by-count":7,"title":["Static features in real-time recognition of isolated vowels at high pitch"],"prefix":"10.1121","volume":"122","author":[{"given":"An\u00edbal J. S.","family":"Ferreira","sequence":"first","affiliation":[{"name":"University of Porto Department of Electrical and Computer Engineering, , Rua Dr. Roberto Frias s\/n, 4200-465 Porto, Portugal"}]}],"member":"231","reference":[{"key":"2023080404545923100_c1","first-page":"95","article-title":"Arguments against formants in the auditory representation of speech","volume-title":"The Representation of Speech in the Peripheral Auditory System","author":"Carlson","year":"1982"},{"key":"2023080404545923100_c2","first-page":"I281","article-title":"Robust formant tracking in noise","year":"2002"},{"key":"2023080404545923100_c3","first-page":"I581","article-title":"Formant frequency estimation in noise","year":"2004"},{"key":"2023080404545923100_c5","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1016\/S0378-5955(00)00113-1","article-title":"The center of gravity effect in vowel spectra and critical distance between the formants: Psychoacoustical study of perception of vowel-like stimuli","volume":"1","year":"1979","journal-title":"Hear. Res."},{"key":"2023080404545923100_c6a","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1109\/TASSP.1980.1163420","article-title":"Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences","volume":"28","year":"1980","journal-title":"IEEE Trans. Acoust., Speech, Signal Process."},{"key":"2023080404545923100_c6b","doi-asserted-by":"publisher","first-page":"3497","DOI":"10.1121\/1.424675","article-title":"Missing-data model of vowel identification","volume":"105","year":"1999","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c7","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1006\/jpho.1996.0011","article-title":"On explaining certain male-female differences in the phonetic realization of vowel categories","volume":"24","year":"1996","journal-title":"J. Phonetics"},{"key":"2023080404545923100_c8","first-page":"1233","article-title":"On integrating insights from human speech perception into automatic speech recognition","year":"2005"},{"key":"2023080404545923100_c9","volume-title":"Acoustic Theory of Speech Production","year":"1970"},{"key":"2023080404545923100_c10","first-page":"203","article-title":"Accurate and robust frequency estimation in the odft domain","year":"2005"},{"key":"2023080404545923100_c11","article-title":"Audio spectral coder","year":"1996"},{"key":"2023080404545923100_c12","article-title":"Perceptual coding of harmonic signals","year":"1996"},{"key":"2023080404545923100_c13","unstructured":"Ferreira, A. J. S. (1998). \u201cSpectral coding and post-processing of high quality audio,\u201d Ph.D. thesis, Faculdade de Engenharia da Universidade do Porto-Portugal, Porto, Portugal, http:\/\/telecom.inescn.pt\/doc\/phd_en.html (last viewed on May 12th 2007)."},{"key":"2023080404545923100_c14","first-page":"47","article-title":"Accurate estimation in the odft domain of the frequency, phase and magnitude of stationary sinusoids","year":"2001"},{"key":"2023080404545923100_c15","first-page":"345","article-title":"New signal features for robust identification of isolated vowels","year":"2005"},{"key":"2023080404545923100_c16","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1080\/09298210600834946","article-title":"Statistical evaluation of music information retrieval experiments","volume":"35","year":"2006","journal-title":"J. New Music Res."},{"issue":"6","key":"2023080404545923100_c16a","doi-asserted-by":"publisher","first-page":"1496","DOI":"10.1121\/1.1914448","article-title":"An optimum processor theory for the central formation of the pitch of complex tones","volume":"54","year":"1973","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c17","doi-asserted-by":"publisher","first-page":"1738","DOI":"10.1121\/1.399423","article-title":"Perceptual linear predictive (plp) analysis of speech","volume":"87","year":"1990","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c18","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/0167-6393(85)90045-7","article-title":"Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain","volume":"4","year":"1985","journal-title":"Speech Commun."},{"key":"2023080404545923100_c19","volume-title":"Pitch Determination of Speech Signals\u2014Algorithms and Devices","year":"1983"},{"key":"2023080404545923100_c20","doi-asserted-by":"publisher","first-page":"1044","DOI":"10.1121\/1.1513647","article-title":"A narrow band pattern-matching model of vowel perception","volume":"113","year":"2003","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c21","doi-asserted-by":"publisher","first-page":"4041","DOI":"10.1121\/1.2188369","article-title":"Speech perception based on spectral peaks versus spectral shape","volume":"119","year":"2006","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c22","first-page":"2997","article-title":"Revisiting perceptual linear prediction (plp)","year":"2005"},{"key":"2023080404545923100_c23","first-page":"1278","article-title":"Prediction of perceived phonetic distance from critical-band spectra\u2014a first step","year":"1982"},{"key":"2023080404545923100_c24","volume-title":"Principles of Experimental Phonetics","author":"Lass","year":"1996"},{"key":"2023080404545923100_c25","first-page":"66","article-title":"Are measured differences between the formants of men, women and children due to f0 differences?","volume":"21","year":"1992","journal-title":"J. Int. Phonetic Assoc."},{"key":"2023080404545923100_c26","doi-asserted-by":"publisher","first-page":"1062","DOI":"10.1121\/1.1943907","article-title":"Evaluating models of vowel perception","volume":"118","year":"2005","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c27","volume-title":"An Introduction to the Psychology of Hearing","year":"1989"},{"key":"2023080404545923100_c28","doi-asserted-by":"publisher","first-page":"3843","DOI":"10.1121\/1.417240","article-title":"Vowel classification in children","volume":"100","year":"1996","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c29","volume-title":"The Intelligent Ear\u2014On the Nature of Sound Perception","year":"2002"},{"key":"2023080404545923100_c30","volume-title":"Fundamentals of Speech Recognition","year":"1993"},{"key":"2023080404545923100_c31","doi-asserted-by":"publisher","first-page":"1631","DOI":"10.1121\/1.388499","article-title":"Fundamental frequency and vowel perception","volume":"72","year":"1982","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c31a","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1016\/j.specom.2004.11.009","article-title":"Human and machine consonant recognition","volume":"45","year":"2005","journal-title":"Speech Commun."},{"key":"2023080404545923100_c32","first-page":"280","article-title":"Vowel identification in singing at high pitch","year":"2000"},{"key":"2023080404545923100_c33","volume-title":"Multirate Systems and Filter Banks","year":"1993"},{"key":"2023080404545923100_c34","doi-asserted-by":"publisher","first-page":"1781","DOI":"10.1121\/1.1781620","article-title":"Evaluation of formant-like features on an automatic vowel classification task","volume":"116","year":"2004","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c35","doi-asserted-by":"publisher","first-page":"1966","DOI":"10.1121\/1.407520","article-title":"Spectral-shape features versus formants as acoustic correlates for vowels","volume":"94","year":"1993","journal-title":"J. Acoust. Soc. Am."},{"key":"2023080404545923100_c36","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1121\/1.1908630","article-title":"Subdivision of the audible frequency range into critical bands","volume":"33","year":"1961","journal-title":"J. Acoust. Soc. Am."}],"container-title":["The Journal of the Acoustical Society of America"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/pubs.aip.org\/asa\/jasa\/article-pdf\/122\/4\/2389\/15282771\/2389_1_online.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/pubs.aip.org\/asa\/jasa\/article-pdf\/122\/4\/2389\/15282771\/2389_1_online.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T05:09:20Z","timestamp":1691125760000},"score":1,"resource":{"primary":{"URL":"https:\/\/pubs.aip.org\/jasa\/article\/122\/4\/2389\/982695\/Static-features-in-real-time-recognition-of"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,10,1]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2007,10,1]]}},"URL":"https:\/\/doi.org\/10.1121\/1.2772228","relation":{},"ISSN":["0001-4966","1520-8524"],"issn-type":[{"value":"0001-4966","type":"print"},{"value":"1520-8524","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2007,10]]},"published":{"date-parts":[[2007,10,1]]}}}