{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T07:13:40Z","timestamp":1774077220058,"version":"3.50.1"},"reference-count":21,"publisher":"Association for Natural Language Processing","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Journal of Natural Language Processing"],"published-print":{"date-parts":[[2026]]},"DOI":"10.5715\/jnlp.33.186","type":"journal-article","created":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T22:12:53Z","timestamp":1773526373000},"page":"186-206","source":"Crossref","is-referenced-by-count":0,"title":["Which One Sounds More Human-like? \u2014Comparison of Synthetic Speech Trained on Japanese Daily Conversational Data with and without Disfluency"],"prefix":"10.5715","volume":"33","author":[{"given":"Akiko","family":"Mokhtari","sequence":"first","affiliation":[{"name":"Toyama Prefectural University"}]},{"given":"Hiroaki","family":"Hatano","sequence":"additional","affiliation":[{"name":"Kobe University"}]},{"given":"Jun","family":"Arai","sequence":"additional","affiliation":[{"name":"Kwansei Gakuin University"}]},{"given":"Nick","family":"Campbell","sequence":"additional","affiliation":[{"name":"The University of Dublin"}]},{"given":"Toshiyuki","family":"Sadanobu","sequence":"additional","affiliation":[{"name":"Kyoto University"}]}],"member":"3685","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"Arnold, J. E., Fagnano, M. and Tanenhaus, M. K. (2003). \u201cDisfluencies Signal Theee, Um, New Information.\u201d <i>Journal of Psycholinguistic Research<\/i>, 32(1), pp. 25\u201336.","DOI":"10.1023\/A:1021980931292"},{"key":"2","unstructured":"Campbell, N. (2000). \u201cDatabases of emotional speech.\u201d <i>In Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research<\/i>, pp. 34\u201337, New Castle, Northern Ireland, UK."},{"key":"3","doi-asserted-by":"crossref","unstructured":"Campbell, N. (2010). \u201cExpressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech.\u201d <i>Speech Technology<\/i>, pp. 105\u2013121, Springer.","DOI":"10.1007\/978-0-387-73819-2_7"},{"key":"4","doi-asserted-by":"crossref","unstructured":"Corley, M., MacGregor, L. J. and Donaldson, D. I. (2007). \u201cIt&apos;s the Way That You, Er, Say It: Hesitations In Speech Affect Language Comprehension.\u201d <i>Cognition<\/i>, 105(3), pp. 658\u2013668.","DOI":"10.1016\/j.cognition.2006.10.010"},{"key":"5","unstructured":"Donahue, C., McAuley, J., and Puckette, M. (2019). \u201cAdversarial Audio Synthesis.\u201d <i>In Proceedings of International Conference on Learning Representations (ICLR)<\/i>, New Orleans, LA, USA."},{"key":"6","doi-asserted-by":"crossref","unstructured":"Funahashi, M., Sudo, J., Sadanobu, T. and Shochi, T. (2026). \u201cTeaching Disfluency in Japanese Language Education and Its Effects on Communication: A Study Focused on Getting-Stuck Utterances.\u201d <i>Disfluencies We Live With<\/i>, Routledge.","DOI":"10.4324\/9781003648369-12"},{"key":"7","unstructured":"Labov, W. (1972). <i>The Sociolinguistic Patterns<\/i>. University of Pennsylvania Press."},{"key":"8","doi-asserted-by":"crossref","unstructured":"Lickley, R. J. (2015). \u201cFluency and disfluency.\u201d <i>The Handbook of Speech Production<\/i>, pp. 445\u2013474, Wiley-Blackwell.","DOI":"10.1002\/9781118584156.ch20"},{"key":"9","doi-asserted-by":"crossref","unstructured":"\u023da\u0144cucki A. (2021). \u201cFastPitch: Parallel Text-to-speech with Pitch Prediction.\u201d <i>In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing<\/i>. pp. 6588\u20136592, Online.","DOI":"10.1109\/ICASSP39728.2021.9413889"},{"key":"10","doi-asserted-by":"crossref","unstructured":"Matsunaga, Y., Saeki, T., Takamichi, S. and Saruwatari, H. (2023). \u201cImproving Robustness of Spontaneous Speech Synthesis with Linguistic Speech Regularization and Pseudo-filled-pause Insertion.\u201d In <i>Proceedings of the 12th ISCA Speech Synthesis Workshop<\/i>, pp. 62\u201368, Grenoble, France.","DOI":"10.21437\/SSW.2023-10"},{"key":"11","unstructured":"\u30e2\u30af\u30bf\u30ea\u660e\u5b50, \u30cb\u30c3\u30af \u30ad\u30e3\u30f3\u30d9\u30eb, \u30e9\u30e0 \u30bf\u30a4 \u30d5\u30c3\u30af, \u5b9a\u5ef6\u5229\u4e4b (2024). \u975e\u6d41\u66a2\u306a\u97f3\u58f0\u5408\u6210\u306b\u5411\u3051\u3066, \u300e\u6d41\u66a2\u6027\u3068\u975e\u6d41\u66a2\u6027\u300f, pp. 439\u2013509, \u3072\u3064\u3058\u66f8\u623f. [A. Mokhtari et al. (2024). Hiryucho na gosei onsei ni mukete. Fluency and Disfluency. pp. 439\u2013509, Hitsuji Publishing Co.]."},{"key":"12","unstructured":"\u30e2\u30af\u30bf\u30ea\u660e\u5b50, \u6ce2\u591a\u91ce\u535a\u9855, \u65b0\u4e95\u6f64, \u30ad\u30e3\u30f3\u30d9\u30eb \u30cb\u30c3\u30af, \u5b9a\u5ef6\u5229\u4e4b (2025). \u5408\u6210\u97f3\u58f0\u306b\u304a\u3051\u308b\u81ea\u7136\u306a\u975e\u6d41\u66a2\u6027\u306e\u7a2e\u985e\u3068\u983b\u5ea6\u306b\u3064\u3044\u3066\u306e\u8003\u5bdf. \u7b2c39\u56de\u65e5\u672c\u97f3\u58f0\u5b66\u4f1a\u5168\u56fd\u5927\u4f1a\u4e88\u7a3f\u96c6, pp. 72\u201377. [A. Mokhtari, et al. (2025). A Study on the Types and Frequencies of Natural Disfluencies in Synthetic Speech. In Proceedings of the 39th General Meeting of the Phonetic Society of Japan, pp. 72\u201377.]."},{"key":"13","doi-asserted-by":"crossref","unstructured":"Mokhtari, A. Hatano, H. Arai, J. Campbell, N. and Sadanobu, T. (2026). \u201cToward Expressive and Disfluent Speech Synthesis.\u201d <i>Disfluencies We Live With<\/i>, Routledge.","DOI":"10.4324\/9781003648369-13"},{"key":"14","unstructured":"\u5b9a\u5ef6\u5229\u4e4b (2019). \u6587\u7bc0\u306e\u6587\u6cd5. \u5927\u4fee\u9928\u66f8\u5e97. [T. Sadanobu (2019). Bunsetsu no Bunpou. Taishukan Publishing Co.]."},{"key":"15","unstructured":"Sadanobu, T. (2021). \u201cAttitudinal Correlates of Word-internal Disfluencies in Japanese Communication.\u201d <i>In Proceedings of the 10th Workshop on Disfluency in Spontaneous Speech<\/i>, Online, pp. 5\u201310."},{"key":"16","doi-asserted-by":"crossref","unstructured":"Schettino, L., Origlia, A. and Cutugno, F. (2024). \u201cThough This Be Hesitant, Yet There Is Method in &apos;t: Effects of Disfluency Patterns in Neural Speech Synthesis For Cultural Heritage Presentations.\u201d <i>Computer Speech &amp; Language<\/i>, 85, Article 101585, pp. 1\u201317.","DOI":"10.1016\/j.csl.2023.101585"},{"key":"17","unstructured":"Shriberg, E.E. (1994). <i>Preliminaries to a Theory of Speech Disfluencies<\/i>. Ph.D. thesis, University of California at Berkeley."},{"key":"18","doi-asserted-by":"crossref","unstructured":"Wagner, P., Beskow, J., Betz, S., Edlund, J., Gustafson, J., Henter, G.E., Maguer, S. L., Malisz, Z. Sz\u00e9kely, \u00c9., T\u00e5nnander, C. and Vo\u00dfe, J. (2019). \u201cSpeech Synthesis Evaluation \u2014 State-of-the-Art Assessment and Suggestion for a Novel Research Program.\u201d <i>In Proceedings of the 10th ISCA Workshop on Speech Synthesis (SSW 10)<\/i>, pp. 105\u2013110, Viena, Austria.","DOI":"10.21437\/SSW.2019-19"},{"key":"19","unstructured":"Wang, S., Gustafson, J., and Sz\u00e9kely, \u00c9. (2022). \u201cEvaluating Sampling-Based Filler Insertion with Spontaneous TTS.\u201d <i>In Proceedings of the 13th Language Resources and Evaluation Conference (LREC)<\/i>, pp. 1960\u20131969, Marseille, France."},{"key":"20","doi-asserted-by":"crossref","unstructured":"Watanabe, M. Hirose, K. Den, Y. and Minematsu, N. (2008). \u201cFilled Pauses as Cues to the Complexity of Upcoming Phrases for Native and Non-Native Listeners.\u201d <i>Speech Communication<\/i>, 50(2), pp. 81\u201394.","DOI":"10.1016\/j.specom.2007.06.002"},{"key":"21","unstructured":"\u5c71\u4e0b\u512a\u6a39, \u90e1\u5c71\u77e5\u6a39, \u9f4b\u85e4\u4f51\u6a39, \u9ad8\u9053\u614e\u4e4b\u4ecb, \u4e95\u5cf6\u52c7\u7950, \u5897\u6751\u4eae, \u733f\u6e21\u6d0b (2020). DNN \u306b\u57fa\u3065\u304f\u8a71\u3057\u8a00\u8449\u97f3\u58f0\u5408\u6210\u306b\u304a\u3051\u308b\u8ffd\u52a0\u30b3\u30f3\u30c6\u30ad\u30b9\u30c8\u306e\u52b9\u679c. \u4fe1\u5b66\u6280\u5831, 119(441), SP2019-61, pp. 65\u201370. [Y. Yamashita et al. (2020). The Effectiveness of Additional Context in DNN-based Spontaneous Speech Synthesis. Technical Report of IEICE, pp. 65\u201370.]."}],"container-title":["Journal of Natural Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/jnlp\/33\/1\/33_186\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T03:53:16Z","timestamp":1774065196000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/jnlp\/33\/1\/33_186\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026]]}},"URL":"https:\/\/doi.org\/10.5715\/jnlp.33.186","relation":{},"ISSN":["1340-7619","2185-8314"],"issn-type":[{"value":"1340-7619","type":"print"},{"value":"2185-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026]]}}}