{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T23:21:53Z","timestamp":1769210513113,"version":"3.49.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2018,6,21]],"date-time":"2018-06-21T00:00:00Z","timestamp":1529539200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1617497"],"award-info":[{"award-number":["1617497"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Interact. Intell. Syst."],"published-print":{"date-parts":[[2018,6,30]]},"abstract":"<jats:p>Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often not feasible to train an automatic labeler because many techniques (e.g., deep learning) require a large number of human-labeled training examples. Also, fully automated labeling may not show sufficient agreement with human labeling for many uses. To solve this issue, we present a human-in-the-loop sound labeling system that helps a user quickly label target sound events in a long audio. It lets a user reduce the time required to label a long audio file (e.g., 20 hours) containing target sounds that are sparsely distributed throughout the recording (10% or less of the audio contains the target) when there are too few labeled examples (e.g., one) to train a state-of-the-art machine audio labeling system. To evaluate the effectiveness of our tool, we performed a human-subject study. The results show that it helped participants label target sound events twice as fast as labeling them manually. In addition to measuring the overall performance of the proposed system, we also measure interaction overhead and machine accuracy, which are two key factors that determine the overall performance. The analysis shows that an ideal interface that does not have interaction overhead at all could speed labeling by as much as a factor of four.<\/jats:p>","DOI":"10.1145\/3214366","type":"journal-article","created":{"date-parts":[[2018,6,21]],"date-time":"2018-06-21T12:04:33Z","timestamp":1529582673000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["A Human-in-the-Loop System for Sound Event Detection and Annotation"],"prefix":"10.1145","volume":"8","author":[{"given":"Bongjun","family":"Kim","sequence":"first","affiliation":[{"name":"Northwestern University, USA"}]},{"given":"Bryan","family":"Pardo","sequence":"additional","affiliation":[{"name":"Northwestern University, USA"}]}],"member":"320","published-online":{"date-parts":[[2018,6,21]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1753326.1753531"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1978966"},{"key":"e_1_2_2_3_1","volume-title":"Proceedings of the International Computer Music Conference (ICMC). Michigan Publishing, 1--1.","author":"Bogaards Niels","year":"2004"},{"key":"e_1_2_2_4_1","volume-title":"Proceedings of the International Computer Music Conference (ICMC). Michigan Publishing, 1--1.","author":"Bogaards Niels","year":"2008"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1178723.1178741"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2557253"},{"key":"e_1_2_2_7_1","volume-title":"Proceedings of 7th International Conference on Music Information Retrieval, ISMIR","author":"Cannam Chris","year":"2006"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28551-6_54"},{"key":"e_1_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Rebecca Anne Fiebrink. 2011. Real-time Human Interaction with Supervised Learning Algorithms for Music Composition and Performance. Ph.D. Dissertation. Princeton NJ USA. Advisor(s) Cook Perry R. AAI3445567.  Rebecca Anne Fiebrink. 2011. Real-time Human Interaction with Supervised Learning Algorithms for Music Composition and Performance. Ph.D. Dissertation. Princeton NJ USA. Advisor(s) Cook Perry R. AAI3445567.","DOI":"10.1145\/1753846.1753889"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1357054.1357061"},{"key":"e_1_2_2_11_1","volume-title":"Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR 2009. ISMIR, 321--326","author":"Fuhrmann Ferdinand","year":"2009"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1282280.1282347"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465958.2465979"},{"key":"e_1_2_2_14_1","volume-title":"Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011","author":"Gulluni S\u00e9bastien","year":"2011"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025171.3025231"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1322192.1322229"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1155\/2009\/239892"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2008.2005345"},{"key":"e_1_2_2_20_1","volume-title":"Proceedings of 7th International Conference on Music Information Retrieval. ISMIR, 379--380","author":"Li Beinan","year":"2006"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0021987"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5946389"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472917"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2009.4959998"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1006\/dspr.1999.0361"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186562.1015720"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298824"},{"key":"e_1_2_2_29_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1467--1478","author":"Settles Burr","year":"2011"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2428998"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1518701.1518895"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13735-012-0014-4"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/AVSS.2007.4425280"},{"key":"e_1_2_2_34_1","volume-title":"Advances in Neural Information Processing Systems. Curran Associates","author":"Vijayanarasimhan Sudheendra"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0721-9"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/985692.985733"},{"key":"e_1_2_2_37_1","volume-title":"Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 1--3.","author":"Vuegena L."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2012.05.008"},{"key":"e_1_2_2_39_1","volume-title":"Proceedings of the International Conference on Image Processing","volume":"2","author":"Wu Emin","year":"2002"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-23344-4_34"}],"container-title":["ACM Transactions on Interactive Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3214366","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3214366","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3214366","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:54:19Z","timestamp":1750287259000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3214366"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,21]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,6,30]]}},"alternative-id":["10.1145\/3214366"],"URL":"https:\/\/doi.org\/10.1145\/3214366","relation":{},"ISSN":["2160-6455","2160-6463"],"issn-type":[{"value":"2160-6455","type":"print"},{"value":"2160-6463","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,6,21]]},"assertion":[{"value":"2016-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}