{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T03:33:09Z","timestamp":1775791989439,"version":"3.50.1"},"reference-count":112,"publisher":"Public Library of Science (PLoS)","issue":"7","license":[{"start":{"date-parts":[[2022,7,19]],"date-time":"2022-07-19T00:00:00Z","timestamp":1658188800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF17OC0027872"],"award-info":[{"award-number":["NNF17OC0027872"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001732","name":"Danish National Research Foundation","doi-asserted-by":"crossref","award":["P1"],"award-info":[{"award-number":["P1"]}],"id":[{"id":"10.13039\/501100001732","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100021851","name":"William Demant Fonden","doi-asserted-by":"publisher","award":["Oticon Centre of Excellence for Hearing and Speech Sciences"],"award-info":[{"award-number":["Oticon Centre of Excellence for Hearing and Speech Sciences"]}],"id":[{"id":"10.13039\/501100021851","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (\u223c4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010273","type":"journal-article","created":{"date-parts":[[2022,7,19]],"date-time":"2022-07-19T17:25:10Z","timestamp":1658251510000},"page":"e1010273","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":9,"title":["Modulation transfer functions for audiovisual speech"],"prefix":"10.1371","volume":"18","author":[{"given":"Nicolai F.","family":"Pedersen","sequence":"first","affiliation":[]},{"given":"Torsten","family":"Dau","sequence":"additional","affiliation":[]},{"given":"Lars Kai","family":"Hansen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3724-3332","authenticated-orcid":true,"given":"Jens","family":"Hjortkj\u00e6r","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,7,19]]},"reference":[{"issue":"5588","key":"pcbi.1010273.ref001","doi-asserted-by":"crossref","first-page":"746","DOI":"10.1038\/264746a0","article-title":"Hearing lips and seeing voices","volume":"264","author":"H McGurk","year":"1976","journal-title":"Nature"},{"issue":"2","key":"pcbi.1010273.ref002","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1121\/1.1907309","article-title":"Visual contribution to speech intelligibility in noise","volume":"26","author":"WH Sumby","year":"1954","journal-title":"The journal of the acoustical society of america"},{"issue":"9","key":"pcbi.1010273.ref003","doi-asserted-by":"crossref","first-page":"1306","DOI":"10.1109\/JPROC.2003.817150","article-title":"Recent advances in the automatic recognition of audiovisual speech","volume":"91","author":"G Potamianos","year":"2003","journal-title":"Proceedings of the IEEE"},{"key":"pcbi.1010273.ref004","doi-asserted-by":"crossref","unstructured":"Ephrat A, Mosseri I, Lang O, Dekel T, Wilson K, Hassidim A, et al. Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation. arXiv preprint arXiv:180403619. 2018;.","DOI":"10.1145\/3197517.3201357"},{"key":"pcbi.1010273.ref005","first-page":"123","article-title":"The moving face during speech communication","author":"KG Munhall","year":"1998","journal-title":"Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech"},{"key":"pcbi.1010273.ref006","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1000436","article-title":"The Natural Statistics of Audiovisual Speech","volume":"5","author":"C Chandrasekaran","year":"2009","journal-title":"PLoS Computational Biology"},{"key":"pcbi.1010273.ref007","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.neubiorev.2017.02.011","article-title":"Temporal modulations in speech and music","volume":"81","author":"N Ding","year":"2017","journal-title":"Neuroscience & Biobehavioral Reviews"},{"issue":"6","key":"pcbi.1010273.ref008","doi-asserted-by":"crossref","first-page":"1119","DOI":"10.1044\/1092-4388(2002\/090)","article-title":"Articulatory Movements in Adolescents: Evidence for Protracted Development of Speech Motor Control Process","volume":"45","author":"B Walsh","year":"2002","journal-title":"Journal of Speech, Language, and Hearing Research"},{"issue":"1","key":"pcbi.1010273.ref009","doi-asserted-by":"crossref","first-page":"5","DOI":"10.52010\/ijom.2007.33.1.1","article-title":"Tongue control for speech and swallowing in healthy younger and older subjects","volume":"33","author":"JW Bennett","year":"2007","journal-title":"International Journal of Orofacial Myology and Myofunctional Therapy"},{"issue":"1-2","key":"pcbi.1010273.ref010","doi-asserted-by":"crossref","first-page":"36","DOI":"10.3109\/14015439109099172","article-title":"Mandibular movements in speech phrases\u2014A syllabic quasiregular continuous oscillation","volume":"16","author":"P Lindblad","year":"1991","journal-title":"Scandinavian Journal of Logopedics and Phoniatrics"},{"issue":"4","key":"pcbi.1010273.ref011","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.archoralbio.2010.02.008","article-title":"Kinematic linkage of the tongue, jaw, and hyoid during eating and speech","volume":"55","author":"K Matsuo","year":"2010","journal-title":"Archives of oral biology"},{"key":"pcbi.1010273.ref012","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/B978-0-12-248550-3.50032-5","article-title":"The temporal regulation of speech","author":"JJ Ohala","year":"1975","journal-title":"Auditory analysis and perception of speech"},{"key":"pcbi.1010273.ref013","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1353\/lan.2011.0057","article-title":"A cross-language perspective on speech information rate","author":"F Pellegrino","year":"2011","journal-title":"Language"},{"issue":"2","key":"pcbi.1010273.ref014","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1017\/S0954394509990093","article-title":"Articulation rate across dialect, age, and gender","volume":"21","author":"E Jacewicz","year":"2009","journal-title":"Language variation and change"},{"issue":"4","key":"pcbi.1010273.ref015","doi-asserted-by":"crossref","first-page":"1976","DOI":"10.1121\/1.5006179","article-title":"A cross-linguistic study of speech modulation spectra","volume":"142","author":"L Varnet","year":"2017","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"3-4","key":"pcbi.1010273.ref016","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/j.wocn.2003.09.005","article-title":"Temporal properties of spontaneous speech\u2014a syllable-centric perspective","volume":"31","author":"S Greenberg","year":"2003","journal-title":"Journal of Phonetics"},{"issue":"6","key":"pcbi.1010273.ref017","doi-asserted-by":"crossref","first-page":"3394","DOI":"10.1121\/1.1624067","article-title":"Modulation spectra of natural sounds and ethological theories of auditory processing","volume":"114","author":"NC Singh","year":"2003","journal-title":"The Journal of the Acoustical Society of America"},{"key":"pcbi.1010273.ref018","doi-asserted-by":"crossref","unstructured":"Kuratate T, Munhall KG, Rubin PE, Vatikiotis-Bateson E, Yehia H. Audio-visual synthesis of talking faces from speech production correlates. In: Sixth European Conference on Speech Communication and Technology; 1999.","DOI":"10.21437\/Eurospeech.1999-300"},{"issue":"1-2","key":"pcbi.1010273.ref019","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/S0167-6393(98)00048-X","article-title":"Quantitative association of vocal-tract and facial behavior","volume":"26","author":"H Yehia","year":"1998","journal-title":"Speech Communication"},{"issue":"3","key":"pcbi.1010273.ref020","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1006\/jpho.2002.0165","article-title":"Linking facial animation, head motion and speech acoustics","volume":"30","author":"HC Yehia","year":"2002","journal-title":"Journal of Phonetics"},{"issue":"11","key":"pcbi.1010273.ref021","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/S1110865702206046","article-title":"On the relationship between face movements, tongue movements, and speech acoustics","volume":"2002","author":"J Jiang","year":"2002","journal-title":"EURASIP Journal on Advances in Signal Processing"},{"issue":"1","key":"pcbi.1010273.ref022","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1121\/1.4939496","article-title":"A multimodal spectral approach to characterize rhythm in natural speech","volume":"139","author":"AM Alexandrou","year":"2016","journal-title":"The Journal of the Acoustical Society of America"},{"key":"pcbi.1010273.ref023","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1016\/j.specom.2013.09.008","article-title":"Gesture and speech in interaction: An overview","volume":"57","author":"P Wagner","year":"2014","journal-title":"Speech Communication"},{"issue":"1-2","key":"pcbi.1010273.ref024","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/0167-9457(83)90004-0","article-title":"Kinematics of head movements accompanying speech during conversation","volume":"2","author":"U Hadar","year":"1983","journal-title":"Human Movement Science"},{"issue":"3","key":"pcbi.1010273.ref025","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/0167-9457(84)90018-6","article-title":"The timing of shifts of head postures during conservation","volume":"3","author":"U Hadar","year":"1984","journal-title":"Human Movement Science"},{"issue":"1","key":"pcbi.1010273.ref026","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1023\/A:1023274823974","article-title":"Pitch and manual gestures","volume":"27","author":"E McClave","year":"1998","journal-title":"Journal of Psycholinguistic Research"},{"key":"pcbi.1010273.ref027","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/j.specom.2013.06.003","article-title":"Tracking eyebrows and head gestures associated with spoken prosody","volume":"57","author":"J Kim","year":"2014","journal-title":"Speech Communication"},{"issue":"2-3","key":"pcbi.1010273.ref028","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1177\/0023830909103167","article-title":"Are eyebrow movements linked to voice variations and turn-taking in dialogue? An experimental investigation","volume":"52","author":"I Gua\u00eftella","year":"2009","journal-title":"Language and speech"},{"issue":"2","key":"pcbi.1010273.ref029","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1111\/j.0963-7214.2004.01502010.x","article-title":"Visual prosody and speech intelligibility: Head movement improves auditory speech perception","volume":"15","author":"KG Munhall","year":"2004","journal-title":"Psychological science"},{"issue":"21","key":"pcbi.1010273.ref030","doi-asserted-by":"crossref","first-page":"11364","DOI":"10.1073\/pnas.2004163117","article-title":"Acoustic information about upper limb movement in voicing","volume":"117","author":"W Pouw","year":"2020","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"1","key":"pcbi.1010273.ref031","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1123\/mcj.15.1.5","article-title":"Limb versus speech motor control: A conceptual review","volume":"15","author":"B Grimme","year":"2011","journal-title":"Motor control"},{"key":"pcbi.1010273.ref032","doi-asserted-by":"crossref","unstructured":"Vatikiotis-Bateson E, Munhall KG, Kasahara Y, Garcia F, Yehia H. Characterizing audiovisual information during speech. In: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP\u201996. vol. 3. IEEE; 1996. p. 1485\u20131488.","DOI":"10.21437\/ICSLP.1996-379"},{"issue":"2","key":"pcbi.1010273.ref033","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1037\/xge0000646","article-title":"Gesture\u2013speech physics: The biomechanical basis for the emergence of gesture\u2013speech synchrony","volume":"149","author":"W Pouw","year":"2020","journal-title":"Journal of Experimental Psychology: General"},{"issue":"4","key":"pcbi.1010273.ref034","doi-asserted-by":"crossref","first-page":"670","DOI":"10.1044\/jshr.3104.670","article-title":"Task-specific organization of activity in human jaw muscles","volume":"31","author":"CA Moore","year":"1988","journal-title":"Journal of Speech, Language, and Hearing Research"},{"issue":"1","key":"pcbi.1010273.ref035","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/S0003-9969(01)00092-9","article-title":"Hyoid and tongue surface movements in speaking and eating","volume":"47","author":"KM Hiiemae","year":"2002","journal-title":"Archives of Oral Biology"},{"issue":"13","key":"pcbi.1010273.ref036","doi-asserted-by":"crossref","first-page":"1176","DOI":"10.1016\/j.cub.2012.04.055","article-title":"Cineradiography of monkey lip-smacking reveals putative precursors of speech dynamics","volume":"22","author":"AA Ghazanfar","year":"2012","journal-title":"Current Biology"},{"issue":"6","key":"pcbi.1010273.ref037","doi-asserted-by":"crossref","first-page":"3718","DOI":"10.1121\/1.414986","article-title":"Functional data analyses of lip motion","volume":"99","author":"JO Ramsay","year":"1996","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"1","key":"pcbi.1010273.ref038","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1121\/1.1928807","article-title":"Empirical modeling of human face kinematics during speech using motion clustering","volume":"118","author":"JC Lucero","year":"2005","journal-title":"The Journal of the Acoustical Society of America"},{"key":"pcbi.1010273.ref039","first-page":"325","article-title":"Estimation and animation of faces using facial motion mapping and a 3D face database","author":"T Kuratate","year":"2005","journal-title":"Computer-graphic facial reconstruction"},{"issue":"4","key":"pcbi.1010273.ref040","doi-asserted-by":"crossref","first-page":"2283","DOI":"10.1121\/1.2973196","article-title":"Analysis of facial motion patterns during speech using a matrix factorization algorithm","volume":"124","author":"JC Lucero","year":"2008","journal-title":"The Journal of the Acoustical Society of America"},{"key":"pcbi.1010273.ref041","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1016\/j.neuroimage.2018.01.033","article-title":"Decoding the auditory brain with canonical component analysis","volume":"172","author":"A de Cheveign\u00e9","year":"2018","journal-title":"NeuroImage"},{"issue":"1","key":"pcbi.1010273.ref042","first-page":"66","article-title":"The modulation transfer function in room acoustics as a predictor of speech intelligibility","volume":"28","author":"T Houtgast","year":"1973","journal-title":"Acta Acustica United with Acustica"},{"issue":"5","key":"pcbi.1010273.ref043","doi-asserted-by":"crossref","first-page":"1364","DOI":"10.1121\/1.383531","article-title":"Temporal modulation transfer functions based upon modulation thresholds","volume":"66","author":"NF Viemeister","year":"1979","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"6","key":"pcbi.1010273.ref044","doi-asserted-by":"crossref","first-page":"3615","DOI":"10.1121\/1.414959","article-title":"A quantitative model of the \u201ceffective\u201dsignal processing in the auditory system. I. Model structure","volume":"99","author":"T Dau","year":"1996","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"3","key":"pcbi.1010273.ref045","doi-asserted-by":"crossref","first-page":"e1000302","DOI":"10.1371\/journal.pcbi.1000302","article-title":"The modulation transfer function for speech intelligibility","volume":"5","author":"TM Elliott","year":"2009","journal-title":"PLoS comput biol"},{"key":"pcbi.1010273.ref046","first-page":"595","article-title":"Neural coding of the temporal envelope of speech: relation to modulation transfer functions","author":"B Delgutte","year":"1998","journal-title":"Psychophysical and physiological advances in hearing"},{"key":"pcbi.1010273.ref047","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.heares.2013.08.017","article-title":"Syllabic (2\u20135 Hz) and fluctuation (1\u201310 Hz) ranges in speech and auditory processing","volume":"305","author":"E Edwards","year":"2013","journal-title":"Hearing research"},{"issue":"51","key":"pcbi.1010273.ref048","doi-asserted-by":"crossref","first-page":"32791","DOI":"10.1073\/pnas.2006192117","article-title":"The interrelationship between the face and vocal tract configuration during audiovisual speech","volume":"117","author":"C Scholes","year":"2020","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"pcbi.1010273.ref049","doi-asserted-by":"crossref","first-page":"137","DOI":"10.21248\/zaspil.42.2005.276","article-title":"On the complex nature of speech kinematics","volume":"42","author":"S Fuchs","year":"2005","journal-title":"ZAS papers in Linguistics"},{"key":"pcbi.1010273.ref050","unstructured":"Afouras T, Chung JS, Zisserman A. LRS3-TED: a large-scale dataset for visual speech recognition. In: arXiv preprint arXiv:1809.00496; 2018."},{"issue":"5","key":"pcbi.1010273.ref051","doi-asserted-by":"crossref","first-page":"2421","DOI":"10.1121\/1.2229005","article-title":"An audio-visual corpus for speech perception and automatic speech recognition","volume":"120","author":"M Cooke","year":"2006","journal-title":"The Journal of the Acoustical Society of America"},{"key":"pcbi.1010273.ref052","unstructured":"Patterson RD, Nimmo-Smith I, Holdsworth J, Rice P. An efficient auditory filterbank based on the gammatone function. In: A meeting of the IOC Speech Group on Auditory Modelling at RSRE. vol. 2; 1987."},{"key":"pcbi.1010273.ref053","doi-asserted-by":"crossref","unstructured":"Bulat A, Tzimiropoulos G. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In: International Conference on Computer Vision; 2017.","DOI":"10.1109\/ICCV.2017.116"},{"key":"pcbi.1010273.ref054","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.neuroimage.2013.10.067","article-title":"On the interpretation of weight vectors of linear models in multivariate neuroimaging","volume":"87","author":"S Haufe","year":"2014","journal-title":"Neuroimage"},{"issue":"4","key":"pcbi.1010273.ref055","doi-asserted-by":"crossref","first-page":"046040","DOI":"10.1088\/1741-2552\/abf771","article-title":"Auditory stimulus-response modeling with a match-mismatch task","volume":"18","author":"A de Cheveign\u00e9","year":"2021","journal-title":"Journal of Neural Engineering"},{"key":"pcbi.1010273.ref056","unstructured":"Head T, MechCoder, Louppe G, Shcherbatyi I, fcharras, Vin\u00c3\u00adcius Z, et al. scikit-optimize\/scikit-optimize: v0.5.2; 2018. Available from: https:\/\/doi.org\/10.5281\/zenodo.1207017."},{"issue":"6","key":"pcbi.1010273.ref057","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1038\/s41583-020-0304-4","article-title":"Speech rhythms and their neural foundations","volume":"21","author":"D Poeppel","year":"2020","journal-title":"Nature reviews neuroscience"},{"issue":"4","key":"pcbi.1010273.ref058","doi-asserted-by":"crossref","first-page":"2173","DOI":"10.1121\/1.1784442","article-title":"A phenomenological model of peripheral and central neural responses to amplitude-modulated tones","volume":"116","author":"PC Nelson","year":"2004","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"3","key":"pcbi.1010273.ref059","doi-asserted-by":"crossref","first-page":"1475","DOI":"10.1121\/1.3621502","article-title":"Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing","volume":"130","author":"S J\u00f8rgensen","year":"2011","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"6","key":"pcbi.1010273.ref060","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1111\/j.1467-8721.2008.00615.x","article-title":"Speech perception as a multimodal phenomenon","volume":"17","author":"LD Rosenblum","year":"2008","journal-title":"Current Directions in Psychological Science"},{"issue":"1","key":"pcbi.1010273.ref061","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-72739-4","article-title":"Sequences of Intonation Units form a ~ 1 Hz rhythm","volume":"10","author":"M Inbar","year":"2020","journal-title":"Scientific reports"},{"issue":"1","key":"pcbi.1010273.ref062","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1515\/lp-2013-0004","article-title":"Speech rhythm and temporal structure: converging perspectives?","volume":"4","author":"U Goswami","year":"2013","journal-title":"Laboratory Phonology"},{"issue":"3","key":"pcbi.1010273.ref063","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1016\/j.jml.2007.06.005","article-title":"The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception","volume":"57","author":"E Krahmer","year":"2007","journal-title":"Journal of memory and language"},{"issue":"4","key":"pcbi.1010273.ref064","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1017\/S0140525X98001265","article-title":"The frame\/content theory of evolution of speech production","volume":"21","author":"PF MacNeilage","year":"1998","journal-title":"Behavioral and brain sciences"},{"issue":"6","key":"pcbi.1010273.ref065","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1162\/jocn_a_00575","article-title":"Facial expressions and the evolution of the speech rhythm","volume":"26","author":"AA Ghazanfar","year":"2014","journal-title":"Journal of cognitive neuroscience"},{"issue":"5","key":"pcbi.1010273.ref066","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1002\/cne.24997","article-title":"Evolution of the speech-ready brain: The voice\/jaw connection in the human motor cortex","volume":"529","author":"S Brown","year":"2021","journal-title":"Journal of Comparative Neurology"},{"issue":"21","key":"pcbi.1010273.ref067","doi-asserted-by":"crossref","first-page":"4276","DOI":"10.1016\/j.cub.2020.08.019","article-title":"Theta synchronization of phonatory and articulatory systems in marmoset monkey vocal production","volume":"30","author":"C Risueno-Segovia","year":"2020","journal-title":"Current Biology"},{"issue":"3","key":"pcbi.1010273.ref068","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1016\/j.infbeh.2007.12.014","article-title":"Characteristics of the rhythmic organization of vocal babbling: Implications for an amodal linguistic rhythm","volume":"31","author":"JK Dolata","year":"2008","journal-title":"Infant Behavior and development"},{"issue":"1","key":"pcbi.1010273.ref069","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1111\/1467-7687.00147","article-title":"Co-occurences of preverbal vocal behavior and motor action in early infancy","volume":"4","author":"K Ejiri","year":"2001","journal-title":"Developmental Science"},{"key":"pcbi.1010273.ref070","article-title":"Synchronization between preverbal vocal behavior and motor action in early infancy: II. An acoustical examination of the functional significance of the synchronization","author":"K Ejiri","year":"1999","journal-title":"Japanese Journal of Psychology"},{"issue":"11-12","key":"pcbi.1010273.ref071","first-page":"19","article-title":"Hand, mouth and brain. The dynamic emergence of speech and gesture","volume":"6","author":"JM Iverson","year":"1999","journal-title":"Journal of Consciousness studies"},{"issue":"4","key":"pcbi.1010273.ref072","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1111\/j.1467-8624.2004.00725.x","article-title":"Infant vocal\u2013motor coordination: precursor to the gesture\u2013speech system?","volume":"75","author":"JM Iverson","year":"2004","journal-title":"Child development"},{"key":"pcbi.1010273.ref073","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1016\/j.specom.2013.06.006","article-title":"Infants temporally coordinate gesture-speech combinations before they produce their first words","volume":"57","author":"N Esteve-Gibert","year":"2014","journal-title":"Speech Communication"},{"issue":"1","key":"pcbi.1010273.ref074","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1002\/dev.20009","article-title":"Development of functional synergies for speech motor coordination in childhood and adolescence","volume":"45","author":"A Smith","year":"2004","journal-title":"Developmental psychobiology"},{"issue":"10","key":"pcbi.1010273.ref075","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1016\/j.tics.2014.06.004","article-title":"The evolution of speech: vision, rhythm, cooperation","volume":"18","author":"AA Ghazanfar","year":"2014","journal-title":"Trends in cognitive sciences"},{"key":"pcbi.1010273.ref076","article-title":"Gesture, speech, and computational stages: A reply to McNeill","author":"B Butterworth","year":"1989","journal-title":"Psychological Review"},{"key":"pcbi.1010273.ref077","unstructured":"McNeill D. Hand and mind. De Gruyter Mouton; 1992."},{"issue":"6708","key":"pcbi.1010273.ref078","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1038\/24300","article-title":"Why people gesture when they speak","volume":"396","author":"JM Iverson","year":"1998","journal-title":"Nature"},{"issue":"4","key":"pcbi.1010273.ref079","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1049\/ip-vis:20045112","article-title":"Realistic speech animation based on observed 3-D face dynamics","volume":"152","author":"P M\u00fcller","year":"2005","journal-title":"IEE Proceedings-Vision, Image and Signal Processing"},{"key":"pcbi.1010273.ref080","doi-asserted-by":"crossref","unstructured":"Graf HP, Cosatto E, Strom V, Huang FJ. Visual prosody: Facial movements accompanying speech. In: Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition. IEEE; 2002. p. 396\u2013401.","DOI":"10.1109\/AFGR.2002.1004186"},{"key":"pcbi.1010273.ref081","doi-asserted-by":"crossref","first-page":"311","DOI":"10.3389\/fnhum.2014.00311","article-title":"Cortical entrainment to continuous speech: functional roles and interpretations","volume":"8","author":"N Ding","year":"2014","journal-title":"Frontiers in human neuroscience"},{"issue":"1","key":"pcbi.1010273.ref082","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1038\/nn.4186","article-title":"Cortical tracking of hierarchical linguistic structures in connected speech","volume":"19","author":"N Ding","year":"2016","journal-title":"Nature neuroscience"},{"issue":"3","key":"pcbi.1010273.ref083","doi-asserted-by":"crossref","first-page":"e2004473","DOI":"10.1371\/journal.pbio.2004473","article-title":"Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features","volume":"16","author":"A Keitel","year":"2018","journal-title":"PLoS biology"},{"key":"pcbi.1010273.ref084","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1016\/j.neuroimage.2013.06.035","article-title":"Acoustic landmarks drive delta\u2013theta oscillations to enable speech comprehension by facilitating perceptual parsing","volume":"85","author":"KB Doelling","year":"2014","journal-title":"Neuroimage"},{"issue":"4","key":"pcbi.1010273.ref085","doi-asserted-by":"crossref","DOI":"10.1523\/ENEURO.0562-20.2021","article-title":"Acoustically Driven Cortical \u03b4 Oscillations Underpin Prosodic Chunking","volume":"8","author":"JM Rimmele","year":"2021","journal-title":"Eneuro"},{"issue":"4","key":"pcbi.1010273.ref086","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nn.3063","article-title":"Cortical oscillations and speech processing: emerging computational principles and operations","volume":"15","author":"AL Giraud","year":"2012","journal-title":"Nature neuroscience"},{"issue":"4","key":"pcbi.1010273.ref087","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1016\/j.conb.2005.06.008","article-title":"Multisensory contributions to low-level, \u2018unisensory\u2019 processing","volume":"15","author":"CE Schroeder","year":"2005","journal-title":"Current opinion in neurobiology"},{"issue":"8","key":"pcbi.1010273.ref088","doi-asserted-by":"crossref","first-page":"e1000445","DOI":"10.1371\/journal.pbio.1000445","article-title":"Auditory cortex tracks both auditory and visual stimulus dynamics using low-frequency neuronal phase modulation","volume":"8","author":"H Luo","year":"2010","journal-title":"PLoS biology"},{"key":"pcbi.1010273.ref089","doi-asserted-by":"crossref","first-page":"e24763","DOI":"10.7554\/eLife.24763","article-title":"Contributions of local speech encoding and functional connectivity to audio-visual speech perception","volume":"6","author":"BL Giordano","year":"2017","journal-title":"Elife"},{"issue":"1-4","key":"pcbi.1010273.ref090","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.specom.2004.10.011","article-title":"Auditory speech detection in noise enhanced by lipreading","volume":"44","author":"LE Bernstein","year":"2004","journal-title":"Speech Communication"},{"issue":"3","key":"pcbi.1010273.ref091","doi-asserted-by":"crossref","first-page":"1197","DOI":"10.1121\/1.1288668","article-title":"The use of visible speech cues for improving auditory detection of spoken sentences","volume":"108","author":"KW Grant","year":"2000","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"2","key":"pcbi.1010273.ref092","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1002\/(SICI)1099-0720(199604)10:2<121::AID-ACP371>3.0.CO;2-V","article-title":"Visible speech as a function of image quality: Effects of display parameters on lipreading ability","volume":"10","author":"M Vitkovitch","year":"1996","journal-title":"Applied cognitive psychology"},{"key":"pcbi.1010273.ref093","unstructured":"de Paula H, Yehia HC, Shiller D, Jozan G, Munhall K, Vatikiotis-Bateson E. Linking production and perception through spatial and temporal filtering of visible speech information. 6th ISSP. 2003; p. 37\u201342."},{"issue":"5","key":"pcbi.1010273.ref094","first-page":"873","article-title":"Contributions of oral and extraoral facial movement to visual and audiovisual speech perception","volume":"30","author":"SM Thomas","year":"2004","journal-title":"Journal of Experimental Psychology: Human Perception and Performance"},{"issue":"2","key":"pcbi.1010273.ref095","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1121\/1.408467","article-title":"Effect of temporal envelope smearing on speech reception","volume":"95","author":"R Drullman","year":"1994","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"1","key":"pcbi.1010273.ref096","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-86725-x","article-title":"Synchronous facial action binds dynamic facial features","volume":"11","author":"A Johnston","year":"2021","journal-title":"Scientific Reports"},{"key":"pcbi.1010273.ref097","doi-asserted-by":"crossref","unstructured":"Ginosar S, Bar A, Kohavi G, Chan C, Owens A, Malik J. Learning individual styles of conversational gesture. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 3497\u20133506.","DOI":"10.1109\/CVPR.2019.00361"},{"key":"pcbi.1010273.ref098","doi-asserted-by":"crossref","unstructured":"Sigg C, Fischer B, Ommer B, Roth V, Buhmann J. Nonnegative CCA for audiovisual source separation. In: 2007 IEEE Workshop on Machine Learning for Signal Processing. IEEE; 2007. p. 253\u2013258.","DOI":"10.1109\/MLSP.2007.4414315"},{"key":"pcbi.1010273.ref099","unstructured":"Slaney M, Covell M. Facesync: A linear operator for measuring synchronization of video facial images and audio tracks. In: Advances in Neural Information Processing Systems; 2001. p. 814\u2013820."},{"issue":"7","key":"pcbi.1010273.ref100","doi-asserted-by":"crossref","first-page":"1396","DOI":"10.1109\/TMM.2007.906583","article-title":"Audiovisual synchronization and fusion using canonical correlation analysis","volume":"9","author":"ME Sargin","year":"2007","journal-title":"IEEE Transactions on Multimedia"},{"issue":"8","key":"pcbi.1010273.ref101","doi-asserted-by":"crossref","first-page":"2329","DOI":"10.1109\/TASL.2012.2201476","article-title":"Generating human-like behaviors using joint, speech-driven models for conversational agents","volume":"20","author":"S Mariooryad","year":"2012","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"7","key":"pcbi.1010273.ref102","doi-asserted-by":"crossref","first-page":"e1003743","DOI":"10.1371\/journal.pcbi.1003743","article-title":"No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag","volume":"10","author":"JL Schwartz","year":"2014","journal-title":"PLoS Comput Biol"},{"issue":"1-2","key":"pcbi.1010273.ref103","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/s10994-009-5153-3","article-title":"Temporal kernel CCA and its application in multimodal neuronal data analysis","volume":"79","author":"F Bie\u00dfmann","year":"2010","journal-title":"Machine Learning"},{"issue":"3","key":"pcbi.1010273.ref104","doi-asserted-by":"crossref","first-page":"2162","DOI":"10.1121\/1.3682040","article-title":"Quantifying time-varying coordination of multimodal speech signals using correlation map analysis","volume":"131","author":"A Vilela Barbosa","year":"2012","journal-title":"The Journal of the Acoustical Society of America"},{"issue":"2","key":"pcbi.1010273.ref105","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s00221-013-3507-3","article-title":"Multisensory temporal integration: task and stimulus dependencies","volume":"227","author":"RA Stevenson","year":"2013","journal-title":"Experimental brain research"},{"key":"pcbi.1010273.ref106","doi-asserted-by":"crossref","first-page":"509","DOI":"10.3389\/fpsyg.2015.00509","article-title":"The effects of processing and sequence organization on the timing of turn taking: a corpus study","volume":"6","author":"SG Roberts","year":"2015","journal-title":"Frontiers in psychology"},{"issue":"2","key":"pcbi.1010273.ref107","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.tins.2019.12.006","article-title":"A hierarchy of autonomous systems for vocal production","volume":"43","author":"YS Zhang","year":"2020","journal-title":"Trends in neurosciences"},{"key":"pcbi.1010273.ref108","doi-asserted-by":"crossref","unstructured":"Trujillo JP, Levinson SC, Holler J. Visual Information in Computer-Mediated Interaction Matters: Investigating the Association Between the Availability of Gesture and Turn Transition Timing in Conversation. In: International Conference on Human-Computer Interaction. Springer; 2021. p. 643\u2013657.","DOI":"10.1007\/978-3-030-78468-3_44"},{"key":"pcbi.1010273.ref109","unstructured":"Barker JP, Berthommier F. Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models. In: AVSP\u201999-International Conference on Auditory-Visual Speech Processing; 1999."},{"issue":"10","key":"pcbi.1010273.ref110","doi-asserted-by":"crossref","first-page":"10762","DOI":"10.1364\/OE.18.010762","article-title":"Non-contact, automated cardiac pulse measurements using video imaging and blind source separation","volume":"18","author":"MZ Poh","year":"2010","journal-title":"Optics express"},{"key":"pcbi.1010273.ref111","doi-asserted-by":"crossref","unstructured":"Maki Y, Monno Y, Tanaka M, Okutomi M. Remote Heart Rate Estimation Based on 3D Facial Landmarks. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE; 2020. p. 2634\u20132637.","DOI":"10.1109\/EMBC44109.2020.9176563"},{"issue":"1","key":"pcbi.1010273.ref112","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13673-015-0052-z","article-title":"Heart rate monitoring using human speech spectral features","volume":"5","author":"AP James","year":"2015","journal-title":"Human-centric Computing and Information Sciences"}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010273","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,29]],"date-time":"2024-09-29T11:55:42Z","timestamp":1727610942000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010273"}},"subtitle":[],"editor":[{"given":"Fr\u00e9d\u00e9ric E.","family":"Theunissen","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,19]]},"references-count":112,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7,19]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010273","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,19]]}}}