{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T06:37:43Z","timestamp":1769755063112,"version":"3.49.0"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7]]},"abstract":"<jats:p>Audio synthesizers are pervasive in modern music production. These highly complex audio generation functions provide a unique diversity through their large sets of parameters. However, this feature also can make them extremely hard and obfuscated to use, especially for non-expert users with no formal knowledge on signal processing.\n\nWe recently introduced a novel formalization of the problem of synthesizer control as learning an invertible mapping between an audio latent space, extracted from the audio signal, and a target parameter latent space, extracted from the synthesizer's presets, using normalizing flows. In addition to model a continuous representation allowing to ease the intuitive exploration of the synthesizer, it also provides a ground-breaking method for audio-based parameter inference, vocal control and macro-control learning.\n\nHere, we discuss the details of integrating these high-level features to develop new interaction schemes between a human user and the generating device: parameters inference from audio, high-level preset visualization and interpolation, that can be used both in off-time and real-time situations. Moreover, we also leverage LeapMotion devices to allow the control of hundreds of parameters simply by moving one hand across space to explore the low-dimensional latent space, allowing to both empower and facilitate the user's interaction with the synthesizer.<\/jats:p>","DOI":"10.24963\/ijcai.2020\/767","type":"proceedings-article","created":{"date-parts":[[2020,7,8]],"date-time":"2020-07-08T22:12:49Z","timestamp":1594246369000},"page":"5273-5275","source":"Crossref","is-referenced-by-count":3,"title":["FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows"],"prefix":"10.24963","author":[{"given":"Philippe","family":"Esling","sequence":"first","affiliation":[{"name":"IRCAM - CNRS UMR 9912, Sorbonne Universit\u00e9"}]},{"given":"Naotake","family":"Masuda","sequence":"additional","affiliation":[{"name":"University of Tokyo"}]},{"given":"Axel","family":"Chemla--Romeu-Santos","sequence":"additional","affiliation":[{"name":"IRCAM - CNRS UMR 9912, Sorbonne Universit\u00e9"}]}],"member":"10584","event":{"name":"Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}","theme":"Artificial Intelligence","location":"Yokohama, Japan","acronym":"IJCAI-PRICAI-2020","number":"28","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2020,7,11]]},"end":{"date-parts":[[2020,7,17]]}},"container-title":["Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2020,7,8]],"date-time":"2020-07-08T22:17:06Z","timestamp":1594246626000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2020\/767"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2020,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2020\/767","relation":{},"subject":[],"published":{"date-parts":[[2020,7]]}}}