{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T23:26:31Z","timestamp":1769642791494,"version":"3.49.0"},"reference-count":13,"publisher":"Association for Computing Machinery (ACM)","issue":"CSCW","license":[{"start":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T00:00:00Z","timestamp":1512518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1544753"],"award-info":[{"award-number":["1544753"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2017,12,6]]},"abstract":"<jats:p>Audio annotation is key to developing machine-listening systems; yet, effective ways to accurately and rapidly obtain crowdsourced audio annotations is understudied. In this work, we seek to quantify the reliability\/redundancy trade-off in crowdsourced soundscape annotation, investigate how visualizations affect accuracy and efficiency, and characterize how performance varies as a function of audio characteristics. Using a controlled experiment, we varied sound visualizations and the complexity of soundscapes presented to human annotators. Results show that more complex audio scenes result in lower annotator agreement, and spectrogram visualizations are superior in producing higher quality annotations at lower cost of time and human labor. We also found recall is more affected than precision by soundscape complexity, and mistakes can be often attributed to certain sound event characteristics. These findings have implications not only for how we should design annotation tasks and interfaces for audio data, but also how we train and evaluate machine-listening systems.<\/jats:p>","DOI":"10.1145\/3134664","type":"journal-article","created":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T21:23:15Z","timestamp":1512595395000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":48,"title":["Seeing Sound"],"prefix":"10.1145","volume":"1","author":[{"given":"Mark","family":"Cartwright","sequence":"first","affiliation":[{"name":"New York University, New York, NY, USA"}]},{"given":"Ayanna","family":"Seals","sequence":"additional","affiliation":[{"name":"New York University, New York, NY, USA"}]},{"given":"Justin","family":"Salamon","sequence":"additional","affiliation":[{"name":"New York University, New York, NY, USA"}]},{"given":"Alex","family":"Williams","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]},{"given":"Stefanie","family":"Mikloska","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]},{"given":"Duncan","family":"MacConnell","sequence":"additional","affiliation":[{"name":"New York University, New York, NY, USA"}]},{"given":"Edith","family":"Law","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]},{"given":"Juan P.","family":"Bello","sequence":"additional","affiliation":[{"name":"New York University, New York, NY, USA"}]},{"given":"Oded","family":"Nov","sequence":"additional","affiliation":[{"name":"New York University, New York, NY, USA"}]}],"member":"320","published-online":{"date-parts":[[2017,12,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apple Inc. 2017. Apple GarageBand. (2017). http:\/\/www.apple.com\/mac\/garageband\/.  Apple Inc. 2017. Apple GarageBand. (2017). http:\/\/www.apple.com\/mac\/garageband\/."},{"key":"e_1_2_1_2_1","unstructured":"Avid Technology Inc. 2017. Pro Tools. (2017). http:\/\/www.avid.com\/pro-tools.  Avid Technology Inc. 2017. Pro Tools. (2017). http:\/\/www.avid.com\/pro-tools."},{"key":"e_1_2_1_3_1","volume-title":"Soundnet: Learning sound representations from unlabeled video Proc. of Advances in Neural Information Processing Systems. 892--900.","author":"Aytar Yusuf","year":"2016","unstructured":"Yusuf Aytar , Carl Vondrick , and Antonio Torralba . 2016 . Soundnet: Learning sound representations from unlabeled video Proc. of Advances in Neural Information Processing Systems. 892--900. Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video Proc. of Advances in Neural Information Processing Systems. 892--900."},{"key":"e_1_2_1_4_1","unstructured":"BBC. 2017. BBC Sound Effects Library. (2017). https:\/\/www.sound-ideas.com\/Product\/154\/BBC-Sound-Effects-Library-Original-CDs-1--60  BBC. 2017. BBC Sound Effects Library. (2017). https:\/\/www.sound-ideas.com\/Product\/154\/BBC-Sound-Effects-Library-Original-CDs-1--60"},{"key":"e_1_2_1_5_1","unstructured":"Mark Cartwright and Bryan Pardo. 2013. Social-EQ: Crowdsourcing an Equalization Descriptor Map Proc. of the International Society for Music Information Retrieval Conference.  Mark Cartwright and Bryan Pardo. 2013. Social-EQ: Crowdsourcing an Equalization Descriptor Map Proc. of the International Society for Music Information Retrieval Conference."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702387"},{"key":"e_1_2_1_7_1","volume-title":"Fast and Easy Crowdsourced Perceptual Audio Evaluation Proc. of the International Conference on Acoustics, Speech and Signal Processing.","author":"Cartwright Mark","year":"2016","unstructured":"Mark Cartwright , Bryan Pardo , Gautham Mysore , and Matthew Hoffman . 2016 . Fast and Easy Crowdsourced Perceptual Audio Evaluation Proc. of the International Conference on Acoustics, Speech and Signal Processing. Mark Cartwright, Bryan Pardo, Gautham Mysore, and Matthew Hoffman. 2016. Fast and Easy Crowdsourced Perceptual Audio Evaluation Proc. of the International Conference on Acoustics, Speech and Signal Processing."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3026044"},{"key":"e_1_2_1_9_1","unstructured":"Cornell Lab of Ornithology. 2017. Raven. (2017). http:\/\/www.birds.cornell.edu\/brp\/raven\/RavenOverview.html  Cornell Lab of Ornithology. 2017. Raven. (2017). http:\/\/www.birds.cornell.edu\/brp\/raven\/RavenOverview.html"},{"key":"e_1_2_1_10_1","volume-title":"Imagenet: A large-scale hierarchical image database Proc. of the IEEE Cnference on Computer Vision and Pattern Recognition","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database Proc. of the IEEE Cnference on Computer Vision and Pattern Recognition . IEEE , 248--255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Proc. of the IEEE Cnference on Computer Vision and Pattern Recognition. IEEE, 248--255."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2557011"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2660168.2660169"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2015.06.026"}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3134664","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3134664","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3134664","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:11:24Z","timestamp":1750212684000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3134664"}},"subtitle":["Investigating the Effects of Visualizations and Complexity on Crowdsourced Audio Annotations"],"short-title":[],"issued":{"date-parts":[[2017,12,6]]},"references-count":13,"journal-issue":{"issue":"CSCW","published-print":{"date-parts":[[2017,12,6]]}},"alternative-id":["10.1145\/3134664"],"URL":"https:\/\/doi.org\/10.1145\/3134664","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12,6]]},"assertion":[{"value":"2017-12-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}