{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T18:45:12Z","timestamp":1761417912630,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":17,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T00:00:00Z","timestamp":1571097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"DOI":"10.1145\/3343031.3351148","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T16:32:26Z","timestamp":1571675546000},"page":"1518-1525","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Towards a Perceptual Loss"],"prefix":"10.1145","author":[{"given":"Ishwarya","family":"Ananthabhotla","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sebastian","family":"Ewert","sequence":"additional","affiliation":[{"name":"Spotify, Inc., London, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joseph A.","family":"Paradiso","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,15]]},"reference":[{"volume-title":"Introduction to digital audio coding and standards","author":"Bosi Marina","key":"e_1_3_2_1_1_1","unstructured":"Marina Bosi and Richard E Goldberg . 2012. Introduction to digital audio coding and standards . Vol. 721 . Springer Science & Business Media . Marina Bosi and Richard E Goldberg. 2012. Introduction to digital audio coding and standards. Vol. 721. Springer Science & Business Media."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.168"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR) .","author":"Donahue Chris","year":"2019","unstructured":"Chris Donahue , Julian McAuley , and Miller Puckette . 2019 . Adversarial Audio Synthesis . In Proceedings of the International Conference on Learning Representations (ICLR) . Chris Donahue, Julian McAuley, and Miller Puckette. 2019. Adversarial Audio Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR) ."},{"key":"e_1_3_2_1_4_1","unstructured":"Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In Advances in Neural Information Processing Systems (NIPS). 658--666.  Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In Advances in Neural Information Processing Systems (NIPS). 658--666."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA\/ICA) .","author":"Emiya Valentin","year":"2010","unstructured":"Valentin Emiya , Emmanuel Vincent , Niklas Harlander , and Volker Hohmann . 2010 . The PEASS Toolkit-Perceptual Evaluation methods for Audio Source Separation . In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA\/ICA) . Valentin Emiya, Emmanuel Vincent, Niklas Harlander, and Volker Hohmann. 2010. The PEASS Toolkit-Perceptual Evaluation methods for Audio Source Separation. In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA\/ICA) ."},{"key":"e_1_3_2_1_6_1","volume-title":"Speech denoising with deep feature losses. arXiv preprint arXiv:1806.10522","author":"Germain Francois G","year":"2018","unstructured":"Francois G Germain , Qifeng Chen , and Vladlen Koltun . 2018. Speech denoising with deep feature losses. arXiv preprint arXiv:1806.10522 ( 2018 ). Francois G Germain, Qifeng Chen, and Vladlen Koltun. 2018. Speech denoising with deep feature losses. arXiv preprint arXiv:1806.10522 (2018)."},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the International Society for Music Information Retrieval (ISMIR) .","author":"Jansson Andreas","year":"2017","unstructured":"Andreas Jansson , Eric Humphrey , Nicola Montecchio , Rachel Bittner , Aparna Kumar , and Tillman Weyde . 2017 . Singing voice separation with deep U-Net convolutional networks . In Proceedings of the International Society for Music Information Retrieval (ISMIR) . Andreas Jansson, Eric Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, and Tillman Weyde. 2017. Singing voice separation with deep U-Net convolutional networks. In Proceedings of the International Society for Music Information Retrieval (ISMIR) ."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_43"},{"key":"e_1_3_2_1_9_1","unstructured":"Sung Kim and Visvesh Sathe. 2019. Adversarial Audio Super-Resolution with Unsupervised Feature Losses. https:\/\/openreview.net\/forum?id=H1eH4n09KX  Sung Kim and Visvesh Sathe. 2019. Adversarial Audio Super-Resolution with Unsupervised Feature Losses. https:\/\/openreview.net\/forum?id=H1eH4n09KX"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR) .","author":"Mehri Soroush","year":"2017","unstructured":"Soroush Mehri , Kundan Kumar , Ishaan Gulrajani , Rithesh Kumar , Shubham Jain , Jose Sotelo , Aaron Courville , and Yoshua Bengio . 2017 . SampleRNN: An unconditional end-to-end neural audio generation model . In Proceedings of the International Conference on Learning Representations (ICLR) . Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. 2017. SampleRNN: An unconditional end-to-end neural audio generation model. In Proceedings of the International Conference on Learning Representations (ICLR) ."},{"key":"e_1_3_2_1_11_1","volume-title":"Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499","author":"van den Oord Aaron","year":"2016","unstructured":"Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , and Koray Kavukcuoglu . 2016 . Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016). Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2001.941023"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461722"},{"key":"e_1_3_2_1_14_1","first-page":"3","article-title":"PEAQ - The ITU standard for objective measurement of perceived audio quality","volume":"48","author":"Thiede Thilo","year":"2000","unstructured":"Thilo Thiede , William C Treurniet , Roland Bitto , Christian Schmidmer , Thomas Sporer , John G Beerends , and Catherine Colomes . 2000 . PEAQ - The ITU standard for objective measurement of perceived audio quality . Journal of the Audio Engineering Society , Vol. 48 , 1\/2 (2000), 3 -- 29 . Thilo Thiede, William C Treurniet, Roland Bitto, Christian Schmidmer, Thomas Sporer, John G Beerends, and Catherine Colomes. 2000. PEAQ - The ITU standard for objective measurement of perceived audio quality. Journal of the Audio Engineering Society , Vol. 48, 1\/2 (2000), 3--29.","journal-title":"Journal of the Audio Engineering Society"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-1452"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461965"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462593"}],"event":{"name":"MM '19: The 27th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Nice France","acronym":"MM '19"},"container-title":["Proceedings of the 27th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3351148","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3343031.3351148","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:12Z","timestamp":1750201992000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3351148"}},"subtitle":["Using a Neural Network Codec Approximation as a Loss for Generative Audio Models"],"short-title":[],"issued":{"date-parts":[[2019,10,15]]},"references-count":17,"alternative-id":["10.1145\/3343031.3351148","10.1145\/3343031"],"URL":"https:\/\/doi.org\/10.1145\/3343031.3351148","relation":{},"subject":[],"published":{"date-parts":[[2019,10,15]]},"assertion":[{"value":"2019-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}