{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T18:57:59Z","timestamp":1767034679637,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,8,3]],"date-time":"2016-08-03T00:00:00Z","timestamp":1470182400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Interact. Intell. Syst."],"published-print":{"date-parts":[[2016,8,3]]},"abstract":"<jats:p>Techniques that use nonverbal behaviors to predict turn-changing situations\u2014such as, in multiparty meetings, who the next speaker will be and when the next utterance will occur\u2014have been receiving a lot of attention in recent research. To build a model for predicting these behaviors we conducted a research study to determine whether respiration could be effectively used as a basis for the prediction. Results of analyses of utterance and respiration data collected from participants in multiparty meetings reveal that the speaker takes a breath more quickly and deeply after the end of an utterance in turn-keeping than in turn-changing. They also indicate that the listener who will be the next speaker takes a bigger breath more quickly and deeply in turn-changing than the other listeners. On the basis of these results, we constructed and evaluated models for predicting the next speaker and the time of the next utterance in multiparty meetings. The results of the evaluation suggest that the characteristics of the speaker's inhalation right after an utterance unit\u2014the points in time at which the inhalation starts and ends after the end of the utterance unit and the amplitude, slope, and duration of the inhalation phase\u2014are effective for predicting the next speaker in multiparty meetings. They further suggest that the characteristics of listeners' inhalation\u2014the points in time at which the inhalation starts and ends after the end of the utterance unit and the minimum and maximum inspiration, amplitude, and slope of the inhalation phase\u2014are effective for predicting the next speaker. The start time and end time of the next speaker's inhalation are also useful for predicting the time of the next utterance in turn-changing.<\/jats:p>","DOI":"10.1145\/2946838","type":"journal-article","created":{"date-parts":[[2016,8,4]],"date-time":"2016-08-04T13:26:34Z","timestamp":1470317194000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":23,"title":["Using Respiration to Predict Who Will Speak Next and When in Multiparty Meetings"],"prefix":"10.1145","volume":"6","author":[{"given":"Ryo","family":"Ishii","sequence":"first","affiliation":[{"name":"NTT Corporation"}]},{"given":"Kazuhiro","family":"Otsuka","sequence":"additional","affiliation":[{"name":"NTT Corporation"}]},{"given":"Shiro","family":"Kumano","sequence":"additional","affiliation":[{"name":"NTT Corporation"}]},{"given":"Junji","family":"Yamato","sequence":"additional","affiliation":[{"name":"NTT Corporation"}]}],"member":"320","published-online":{"date-parts":[[2016,8,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Athos. 2014. http:\/\/www.mindmedia.info\/CMS2014\/products\/systems\/nexus-10-mkii.  Athos. 2014. http:\/\/www.mindmedia.info\/CMS2014\/products\/systems\/nexus-10-mkii."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1953016"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1647314.1647320"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1647314.1647332"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309","author":"Dielmann Alfred","year":"2010","unstructured":"Alfred Dielmann , Giulia Garau , and Herv Bourlard . 2010 . Floor holder detection and end of speaker turn prediction in meetings . In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309 . Alfred Dielmann, Giulia Garau, and Herv Bourlard. 2010. Floor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of Multimodal Corpora: Combining Applied and Basic Research Targets. 35--36","author":"Edlund Jens","year":"2014","unstructured":"Jens Edlund , Mattias Heldner , and Marcin Wodarczak . 2014 . Catching wind of multiparty conversation . In Proceedings of Multimodal Corpora: Combining Applied and Basic Research Targets. 35--36 . Jens Edlund, Mattias Heldner, and Marcin Wodarczak. 2014. Catching wind of multiparty conversation. In Proceedings of Multimodal Corpora: Combining Applied and Basic Research Targets. 35--36."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the Annual Conference on the International Speech Communication Association","volume":"3","author":"Ferrer Luciana","year":"2002","unstructured":"Luciana Ferrer , Elizabeth Shriberg , and Andreas Stolcke . 2002 . Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog . In Proceedings of the Annual Conference on the International Speech Communication Association , Vol. 3 . 2061--2064. Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings of the Annual Conference on the International Speech Communication Association, Vol. 3. 2061--2064."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the Annual Conference on the International Speech Communication Association. 229--232","author":"Fukuda Takashi","year":"2011","unstructured":"Takashi Fukuda , Osamu Ichikawa , and Masafumi Nishimura . 2011 . Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition . In Proceedings of the Annual Conference on the International Speech Communication Association. 229--232 . Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura. 2011. Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition. In Proceedings of the Annual Conference on the International Speech Communication Association. 229--232."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MFI.2006.265658"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323","author":"Ishii Ryo","year":"2015","unstructured":"Ryo Ishii , Shiro Kumano , and Kazuhiro Otsuka . 2015 . Predicting next speaker using head movement in multi-party meetings . In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323 . Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015. Predicting next speaker using head movement in multi-party meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2522848.2522856"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2757284"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6853685"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499474.2499481"},{"volume-title":"Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.","author":"Jovanovic Natasa","key":"e_1_2_1_15_1","unstructured":"Natasa Jovanovic , Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings . In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Annual Conference on the International Speech Communication Association.","author":"Kawahara Tatsuya","year":"2012","unstructured":"Tatsuya Kawahara , Takuma Iwatate , and Katsuya Takanashii . 2012 . Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations . In Proceedings of the Annual Conference on the International Speech Communication Association. Tatsuya Kawahara, Takuma Iwatate, and Katsuya Takanashii. 2012. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In Proceedings of the Annual Conference on the International Speech Communication Association."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976601300014493"},{"key":"e_1_2_1_18_1","first-page":"22","article-title":"Some functions of gaze direction in social interaction","volume":"26","author":"Kendon Adam","year":"1967","unstructured":"Adam Kendon . 1967 . Some functions of gaze direction in social interaction . ActaPsychologica 26 (1967), 22 -- 63 . Adam Kendon. 1967. Some functions of gaze direction in social interaction. ActaPsychologica 26 (1967), 22--63.","journal-title":"ActaPsychologica"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1177\/002383099804100404"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5947629"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the SIGHAN Workshop on Chinese Language Processing.","author":"Levow Gina-Anne","year":"2005","unstructured":"Gina-Anne Levow . 2005 . Turn-taking in Mandarin dialogue: Interactions of tones and intonation . In Proceedings of the SIGHAN Workshop on Chinese Language Processing. Gina-Anne Levow. 2005. Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop on Chinese Language Processing."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1044\/1092-4388(2001\/012)"},{"key":"e_1_2_1_23_1","unstructured":"MIND MEDIA. 2014. NeXus-10 MARKII. http:\/\/www.mindmedia.info\/CMS2014\/products\/systems\/nexus-10-mkii.  MIND MEDIA. 2014. NeXus-10 MARKII. http:\/\/www.mindmedia.info\/CMS2014\/products\/systems\/nexus-10-mkii."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2011.941100"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1647314.1647354"},{"key":"e_1_2_1_26_1","unstructured":"Philips. 2014. Vital Signs Camera. http:\/\/www.vitalsignscamera.com\/.  Philips. 2014. Vital Signs Camera. http:\/\/www.vitalsignscamera.com\/."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Amelie Rochet-Capellan Gerard Bailly and Susanne Fuchs. 2014. Is breathing sensitive to the communication partner? In Speech Prosody. 613--617.  Amelie Rochet-Capellan Gerard Bailly and Susanne Fuchs. 2014. Is breathing sensitive to the communication partner? In Speech Prosody. 613--617.","DOI":"10.21437\/SpeechProsody.2014-111"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1353\/lan.1974.0010"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the Annual Conference on the International Speech Communication Association. 17--21","author":"Schlangen David","year":"2006","unstructured":"David Schlangen . 2006 . From reaction to prediction experiments with computational models of turn-taking . In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21 . David Schlangen. 2006. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:STCO.0000035301.49549.88"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1044\/jshr.3801.124"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1118\/1.4704644"}],"container-title":["ACM Transactions on Interactive Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2946838","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2946838","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:07:02Z","timestamp":1750223222000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2946838"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,8,3]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,8,3]]}},"alternative-id":["10.1145\/2946838"],"URL":"https:\/\/doi.org\/10.1145\/2946838","relation":{},"ISSN":["2160-6455","2160-6463"],"issn-type":[{"type":"print","value":"2160-6455"},{"type":"electronic","value":"2160-6463"}],"subject":[],"published":{"date-parts":[[2016,8,3]]},"assertion":[{"value":"2015-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-08-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}