{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:21:29Z","timestamp":1750220489643,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["2019R1A6A3A13095526,2020R1A2C1012624,2021R1A2C2011452"],"award-info":[{"award-number":["2019R1A6A3A13095526,2020R1A2C1012624,2021R1A2C2011452"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3475323","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T05:04:15Z","timestamp":1634533455000},"page":"1775-1783","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries"],"prefix":"10.1145","author":[{"given":"Woosung","family":"Choi","sequence":"first","affiliation":[{"name":"Korea University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minseok","family":"Kim","sequence":"additional","affiliation":[{"name":"Korea University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marco A.","family":"Mart\u00ednez Ram\u00edrez","sequence":"additional","affiliation":[{"name":"Queen Mary University of London, London, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jaehwa","family":"Chung","sequence":"additional","affiliation":[{"name":"Korea National Open University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Soonyoung","family":"Jung","sequence":"additional","affiliation":[{"name":"Korea University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASPAA.2003.1285818"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.667"},{"key":"e_1_3_2_2_3_1","volume-title":"LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Sepa-ration. arXiv preprint arXiv:2010.11631","author":"Choi Woosung","year":"2020","unstructured":"Woosung Choi , Minseok Kim , Jaehwa Chung , and Soonyoung Jung . 2020. LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Sepa-ration. arXiv preprint arXiv:2010.11631 ( 2020 ). Woosung Choi, Minseok Kim, Jaehwa Chung, and Soonyoung Jung. 2020. LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Sepa-ration. arXiv preprint arXiv:2010.11631 (2020)."},{"key":"e_1_3_2_2_4_1","volume-title":"Proceedings of the 21th International Society for Music Information Retrieval Conference.","author":"Choi Woosung","year":"2020","unstructured":"Woosung Choi , Minseok Kim , Jaehwa Chung , Daewon Lee , and Soonyoung Jung . 2020 . Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation . In Proceedings of the 21th International Society for Music Information Retrieval Conference. Woosung Choi, Minseok Kim, Jaehwa Chung, Daewon Lee, and Soonyoung Jung. 2020. Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation. In Proceedings of the 21th International Society for Music Information Retrieval Conference."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1956.1056813"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9052990"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_2_8_1","volume-title":"18th International Society for Music Information Retrieval Conference. 745--751","author":"Jansson Andreas","year":"2017","unstructured":"Andreas Jansson , Eric Humphrey , Nicola Montecchio , Rachel Bittner , Aparna Kumar , and Tillman Weyde . 2017 . Singing voice separation with deep u-net con-volutional networks . In 18th International Society for Music Information Retrieval Conference. 745--751 . Andreas Jansson, Eric Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, and Tillman Weyde. 2017. Singing voice separation with deep u-net con-volutional networks. In 18th International Society for Music Information Retrieval Conference. 745--751."},{"key":"e_1_3_2_2_9_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Opti-mization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1412.6980 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti-mization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00790"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3367471.3367699"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413505"},{"key":"e_1_3_2_2_13_1","volume-title":"Deep learning for black-box modeling of audio effects. APPLIED SCIENCES-BASEL 10, 2","author":"Mart\u00ednez Ram\u00edrez Marco A","year":"2020","unstructured":"Marco A Mart\u00ednez Ram\u00edrez , Emmanouil Benetos , and Joshua D Reiss . 2020. Deep learning for black-box modeling of audio effects. APPLIED SCIENCES-BASEL 10, 2 ( 2020 ). Marco A Mart\u00ednez Ram\u00edrez, Emmanouil Benetos, and Joshua D Reiss. 2020. Deep learning for black-box modeling of audio effects. APPLIED SCIENCES-BASEL 10, 2 (2020)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683529"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.17743\/jaes.2020.0031"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.17743\/jaes.2020.0031"},{"key":"e_1_3_2_2_17_1","unstructured":"D. Matz Estefan\u00eda Cano and J. Abe\u00dfer. 2015. New Sonorities for Early Jazz Recordings Using Sound Source Separation and Automatic Mixing Tools. In ISMIR.  D. Matz Estefan\u00eda Cano and J. Abe\u00dfer. 2015. New Sonorities for Early Jazz Recordings Using Sound Source Separation and Automatic Mixing Tools. In ISMIR."},{"key":"e_1_3_2_2_18_1","volume-title":"20th International Society for Music Information Retrieval Conference, ISMIR (Ed.).","author":"Meseguer-Brocal Gabriel","year":"2019","unstructured":"Gabriel Meseguer-Brocal and Geoffroy Peeters . 2019 . CONDITIONED-U-NET: Introducing a Control Mechanism in the U-net For Multiple Source Separations .. In 20th International Society for Music Information Retrieval Conference, ISMIR (Ed.). Gabriel Meseguer-Brocal and Geoffroy Peeters. 2019. CONDITIONED-U-NET: Introducing a Control Mechanism in the U-net For Multiple Source Separations.. In 20th International Society for Music Information Retrieval Conference, ISMIR (Ed.)."},{"key":"e_1_3_2_2_19_1","volume-title":"2nd AES Workshop on Intelligent Music Production","volume":"13","author":"Mimilakis Stylianos Ioannis","year":"2016","unstructured":"Stylianos Ioannis Mimilakis , Estefana Cano , Jakob Abe\u00dfer , and Gerald Schuller . 2016 . New sonorities for jazz recordings: Separation and mixing using deep neural networks . In 2nd AES Workshop on Intelligent Music Production , Vol. 13 . Stylianos Ioannis Mimilakis, Estefana Cano, Jakob Abe\u00dfer, and Gerald Schuller. 2016. New sonorities for jazz recordings: Separation and mixing using deep neural networks. In 2nd AES Workshop on Intelligent Music Production, Vol. 13."},{"key":"e_1_3_2_2_20_1","first-page":"1532","article-title":"Glove: Global Vectors for Word Representation","volume":"14","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D Manning . 2014 . Glove: Global Vectors for Word Representation .. In EMNLP , Vol. 14. 1532 -- 1543 . Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation.. In EMNLP, Vol. 14. 1532--1543.","journal-title":"EMNLP"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Ethan Perez Florian Strub Harm de Vries Vincent Dumoulin and Aaron C Courville. 2018. FiLM: Visual Reasoning with a General Conditioning Layer. In AAAI.  Ethan Perez Florian Strub Harm de Vries Vincent Dumoulin and Aaron C Courville. 2018. FiLM: Visual Reasoning with a General Conditioning Layer. In AAAI.","DOI":"10.1609\/aaai.v32i1.11671"},{"key":"e_1_3_2_2_22_1","volume-title":"Stylianos Ioannis Mimilakis, and Rachel Bittner","author":"Rafii Zafar","year":"2017","unstructured":"Zafar Rafii , Antoine Liutkus , Fabian-Robert St\u00f6ter , Stylianos Ioannis Mimilakis, and Rachel Bittner . 2017 . MUSDB18 - a corpus for music separation. https:\/\/doi. org\/10.5281\/zenodo.1117371 MUSDB 18: a corpus for music source separation. Zafar Rafii, Antoine Liutkus, Fabian-Robert St\u00f6ter, Stylianos Ioannis Mimilakis, and Rachel Bittner. 2017. MUSDB18 - a corpus for music separation. https:\/\/doi. org\/10.5281\/zenodo.1117371 MUSDB18: a corpus for music source separation."},{"key":"e_1_3_2_2_23_1","volume-title":"21st International Conference on Digital Audio Effects (DAFx-18)","author":"Ram\u00edrez Mart\u00ednez","year":"2018","unstructured":"Mart\u00ednez Ram\u00edrez and Joshua D Reiss . 2018 . End-to-end equalization with convo-lutional neural networks . In 21st International Conference on Digital Audio Effects (DAFx-18) . Mart\u00ednez Ram\u00edrez and Joshua D Reiss. 2018. End-to-end equalization with convo-lutional neural networks. In 21st International Conference on Digital Audio Effects (DAFx-18)."},{"key":"e_1_3_2_2_24_1","volume-title":"Meta-learning Extractors for Music Source Separation. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 816--820","author":"Samuel David","year":"2020","unstructured":"David Samuel , Aditya Ganeshan , and Jason Naradowsky . 2020 . Meta-learning Extractors for Music Source Separation. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 816--820 . David Samuel, Aditya Ganeshan, and Jason Naradowsky. 2020. Meta-learning Extractors for Music Source Separation. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 816--820."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_2_2_26_1","volume-title":"Audio Engineering Society Convention 128","author":"Stein Michael","year":"2010","unstructured":"Michael Stein , Jakob Abe\u00dfer , Christian Dittmar , and Gerald Schuller . 2010 . Auto-matic detection of audio effects in guitar and bass recordings . In Audio Engineering Society Convention 128 . Audio Engineering Society. Michael Stein, Jakob Abe\u00dfer, Christian Dittmar, and Gerald Schuller. 2010. Auto-matic detection of audio effects in guitar and bass recordings. In Audio Engineering Society Convention 128. Audio Engineering Society."},{"key":"e_1_3_2_2_27_1","volume-title":"Au-tomatic multitrack mixing with a differentiable mixing console of neural audio effects. arXiv preprint arXiv:2010.10291","author":"Steinmetz Christian J","year":"2020","unstructured":"Christian J Steinmetz , Jordi Pons , Santiago Pascual , and Joan Serr\u00e0 . 2020. Au-tomatic multitrack mixing with a differentiable mixing console of neural audio effects. arXiv preprint arXiv:2010.10291 ( 2020 ). Christian J Steinmetz, Jordi Pons, Santiago Pascual, and Joan Serr\u00e0. 2020. Au-tomatic multitrack mixing with a differentiable mixing console of neural audio effects. arXiv preprint arXiv:2010.10291 (2020)."},{"key":"e_1_3_2_2_28_1","volume-title":"Efficient Neural Networks for Real-time Analog Audio Effect Modeling. arXiv preprint arXiv:2102.06200","author":"Steinmetz Christian J","year":"2021","unstructured":"Christian J Steinmetz and Joshua D Reiss . 2021. Efficient Neural Networks for Real-time Analog Audio Effect Modeling. arXiv preprint arXiv:2102.06200 ( 2021 ). Christian J Steinmetz and Joshua D Reiss. 2021. Efficient Neural Networks for Real-time Analog Audio Effect Modeling. arXiv preprint arXiv:2102.06200 (2021)."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952158"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.858005"},{"key":"e_1_3_2_2_32_1","volume-title":"Hershey","author":"Wisdom Scott","year":"2020","unstructured":"Scott Wisdom , Efthymios Tzinis , Hakan Erdogan , Ron J. Weiss , Kevin Wilson , and John R . Hershey . 2020 . Unsupervised Sound Separation Using Mixture Invariant Training. In NeurIPS. https:\/\/arxiv.org\/pdf\/2006.12701.pdf Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, and John R. Hershey. 2020. Unsupervised Sound Separation Using Mixture Invariant Training. In NeurIPS. https:\/\/arxiv.org\/pdf\/2006.12701.pdf"},{"key":"e_1_3_2_2_33_1","volume-title":"Real-Time Guitar Amplifier Emulation with Deep Learning. Applied Sciences 10, 3","author":"Wright Alec","year":"2020","unstructured":"Alec Wright , Eero-Pekka Damsk\u00e4gg , Lauri Juvela , and Vesa V\u00e4lim\u00e4ki . 2020. Real-Time Guitar Amplifier Emulation with Deep Learning. Applied Sciences 10, 3 ( 2020 ). https:\/\/doi.org\/10.3390\/app10030766 10.3390\/app10030766 Alec Wright, Eero-Pekka Damsk\u00e4gg, Lauri Juvela, and Vesa V\u00e4lim\u00e4ki. 2020. Real-Time Guitar Amplifier Emulation with Deep Learning. Applied Sciences 10, 3 (2020). https:\/\/doi.org\/10.3390\/app10030766"},{"key":"e_1_3_2_2_34_1","volume-title":"Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:1706.09559","author":"Wyse Lonce","year":"2017","unstructured":"Lonce Wyse . 2017. Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:1706.09559 ( 2017 ). Lonce Wyse. 2017. Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017)."},{"key":"e_1_3_2_2_35_1","volume-title":"PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network. arXiv preprint arXiv:1911.04697","author":"Yin Dacheng","year":"2019","unstructured":"Dacheng Yin , Chong Luo , Zhiwei Xiong , and Wenjun Zeng . 2019 . PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network. arXiv preprint arXiv:1911.04697 (2019). Dacheng Yin, Chong Luo, Zhiwei Xiong, and Wenjun Zeng. 2019. PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network. arXiv preprint arXiv:1911.04697 (2019)."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00577"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2019.2935867"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"event":{"name":"MM '21: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Virtual Event China","acronym":"MM '21"},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475323","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3475323","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:18Z","timestamp":1750193358000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475323"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":38,"alternative-id":["10.1145\/3474085.3475323","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3475323","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}