{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,13]],"date-time":"2025-12-13T23:09:28Z","timestamp":1765667368008,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T00:00:00Z","timestamp":1630454400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9]]},"DOI":"10.1145\/3478384.3478423","type":"proceedings-article","created":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T02:06:13Z","timestamp":1634436373000},"page":"101-108","source":"Crossref","is-referenced-by-count":4,"title":["Similarity Analysis of Visual Sketch-based Search for Sounds"],"prefix":"10.1145","author":[{"given":"Lars","family":"Engeln","sequence":"first","affiliation":[{"name":"Technische Universit\u00e4t Dresden, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nhat Long","family":"Le","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Dresden, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthew","family":"McGinity","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Dresden, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rainer","family":"Groh","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Dresden, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,15]]},"reference":[{"volume-title":"Audiovisual correspondence between musical timbre and visual shapes. Frontiers in Human Neuroscience 8 (May","year":"2014","author":"Adeli Mohammad","key":"e_1_3_2_1_1_1"},{"key":"e_1_3_2_1_2_1","unstructured":"Kristina Andersen and Peter Knees. 2016. Conversations with Expert Users in Music Retrieval and Research Challenges for Creative MIR.. In ISMIR. 122\u2013128. https:\/\/research.tue.nl\/en\/publications\/conversations-with-expert-users-in-music-retrieval-and-research-c  Kristina Andersen and Peter Knees. 2016. Conversations with Expert Users in Music Retrieval and Research Challenges for Creative MIR.. In ISMIR. 122\u2013128. https:\/\/research.tue.nl\/en\/publications\/conversations-with-expert-users-in-music-retrieval-and-research-c"},{"key":"e_1_3_2_1_3_1","unstructured":"C\u0103t\u0103lina Cangea Petar Veli\u010dkovi\u0107 and Pietro Li\u00f2. 2017. XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification. (2017).  C\u0103t\u0103lina Cangea Petar Veli\u010dkovi\u0107 and Pietro Li\u00f2. 2017. XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification. (2017)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912000"},{"key":"e_1_3_2_1_5_1","unstructured":"Ya-Xi Chen and Ren\u00e9 Kl\u00fcber. 2010. ThumbnailDJ: Visual Thumbnails of Music Content.. In ISMIR. 565\u2013570.  Ya-Xi Chen and Ren\u00e9 Kl\u00fcber. 2010. ThumbnailDJ: Visual Thumbnails of Music Content.. In ISMIR. 565\u2013570."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1111\/cogs.12791"},{"key":"e_1_3_2_1_7_1","first-page":"17","article-title":"An effective analysis of deep learning based approaches for audio based feature extraction and its visualization","volume":"78","author":"Biswas Rohit","year":"2018","journal-title":"Multimedia Tools and Applications"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Lars Engeln and Rainer Groh. 2020. CoHEARence of audible shapes\u2014a qualitative user study for coherent visual audio design with resynthesized shapes. Personal and Ubiquitous Computing(2020) 1\u201311. https:\/\/doi.org\/10.1007\/s00779-020-01392-5  Lars Engeln and Rainer Groh. 2020. CoHEARence of audible shapes\u2014a qualitative user study for coherent visual audio design with resynthesized shapes. Personal and Ubiquitous Computing(2020) 1\u201311. https:\/\/doi.org\/10.1007\/s00779-020-01392-5","DOI":"10.1007\/s00779-020-01392-5"},{"key":"e_1_3_2_1_9_1","first-page":"1","article-title":"Natural cross-modal mappings between visual and auditory features","volume":"10","author":"Evans K.","year":"2010","journal-title":"Journal of Vision"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654902"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.2197\/ipsjjip.17.292"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Thomas Grill and Arthur Flexer. 2012. Visualization of perceptual qualities in textural sounds. In ICMC. Citeseer.  Thomas Grill and Arthur Flexer. 2012. Visualization of perceptual qualities in textural sounds. In ICMC. Citeseer.","DOI":"10.1145\/2095667.2095677"},{"volume-title":"Neural Information Processing","author":"Guo Xifeng","key":"e_1_3_2_1_13_1"},{"key":"e_1_3_2_1_14_1","unstructured":"David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. (2017). arXiv:1704.03477v4\u00a0[cs.NE]  David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. (2017). arXiv:1704.03477v4\u00a0[cs.NE]"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45453-5_76"},{"volume-title":"Proceedings of the International Conference on Music Information Retrieval. 170\u2013177","year":"2004","author":"Kapur Ajay","key":"e_1_3_2_1_17_1"},{"volume-title":"Database architecture for content-based image retrieval","author":"Kato Toshikazu","key":"e_1_3_2_1_18_1"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912021"},{"volume-title":"Gestalt psychology. Psychologische Forschung 31, 1","year":"1967","author":"K\u00f6hler Wolfgang","key":"e_1_3_2_1_20_1"},{"volume-title":"Music Icons: Procedural Glyphs for Audio Files. In 2006 19th Brazilian Symposium on Computer Graphics and Image Processing. IEEE. https:\/\/doi.org\/10","year":"2006","author":"Kolhoff Philipp","key":"e_1_3_2_1_21_1"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/354384.354520"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2008.01.025"},{"key":"e_1_3_2_1_24_1","unstructured":"Sebastian L\u00f6bbers Mathieu Barthet and Gy\u00f6rgy Fazekas. 2021. Sketching sounds: an exploratory study on sound-shape associations. arXiv preprint arXiv:2107.07360(2021).  Sebastian L\u00f6bbers Mathieu Barthet and Gy\u00f6rgy Fazekas. 2021. Sketching sounds: an exploratory study on sound-shape associations. arXiv preprint arXiv:2107.07360(2021)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2018.2868887"},{"key":"e_1_3_2_1_26_1","unstructured":"Jiquan Ngiam Aditya Khosla Mingyu Kim Juhan Nam Honglak Lee and Andrew\u00a0Y. Ng. 2011. Multimodal Deep Learning. In ICML. 689\u2013696. https:\/\/icml.cc\/2011\/papers\/399_icmlpaper.pdf  Jiquan Ngiam Aditya Khosla Mingyu Kim Juhan Nam Honglak Lee and Andrew\u00a0Y. Ng. 2011. Multimodal Deep Learning. In ICML. 689\u2013696. https:\/\/icml.cc\/2011\/papers\/399_icmlpaper.pdf"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jecp.2012.05.004"},{"key":"e_1_3_2_1_28_1","unstructured":"Amir\u00a0Hossein Poorjam. 2018. Why we take only 12-13 MFCC coefficients in feature extraction?  Amir\u00a0Hossein Poorjam. 2018. Why we take only 12-13 MFCC coefficients in feature extraction?"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3765\/bls.v36i1.3926"},{"volume-title":"Proceedings of the European Conference on Computer Vision (ECCV) Workshops.","year":"2018","author":"Suris Didac","key":"e_1_3_2_1_30_1"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CISP.2011.6100457"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2557722"},{"volume-title":"Multi-modal semantic autoencoder for cross-modal retrieval. Neurocomputing 331 (feb","year":"2019","author":"Wu Yiling","key":"e_1_3_2_1_33_1"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2942073"},{"key":"e_1_3_2_1_35_1","unstructured":"Peng Xu Yongye Huang Tongtong Yuan Kaiyue Pang Yi-Zhe Song Tao Xiang Timothy\u00a0M. Hospedales Zhanyu Ma and Jun Guo. 2018. SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval. (2018). arXiv:1804.01401v1\u00a0[cs.CV]  Peng Xu Yongye Huang Tongtong Yuan Kaiyue Pang Yi-Zhe Song Tao Xiang Timothy\u00a0M. Hospedales Zhanyu Ma and Jun Guo. 2018. SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval. (2018). arXiv:1804.01401v1\u00a0[cs.CV]"},{"key":"e_1_3_2_1_36_1","unstructured":"Peng Xu Zeyu Song Qiyue Yin Yi-Zhe Song and Liang Wang. 2020. Deep Self-Supervised Representation Learning for Free-Hand Sketch. (2020). arXiv:2002.00867v1  Peng Xu Zeyu Song Qiyue Yin Yi-Zhe Song and Liang Wang. 2020. Deep Self-Supervised Representation Learning for Free-Hand Sketch. (2020). arXiv:2002.00867v1"},{"key":"e_1_3_2_1_37_1","first-page":"2","article-title":"Deep adversarial metric learning for cross-modal retrieval","volume":"22","author":"Xu Xing","year":"2018","journal-title":"World Wide Web"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2017.2690563"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2008.2007345"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3281746","article-title":"Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval","volume":"15","author":"Yu Yi","year":"2019","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3387164"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44594-3"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291290"},{"volume-title":"Intelligent Computing Theories and Application","author":"Zou Hui","key":"e_1_3_2_1_44_1"}],"event":{"name":"AM '21: Audio Mostly 2021","acronym":"AM '21","location":"virtual\/Trento Italy"},"container-title":["Audio Mostly 2021"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478384.3478423","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478384.3478423","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:04Z","timestamp":1750188604000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478384.3478423"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9]]},"references-count":44,"alternative-id":["10.1145\/3478384.3478423","10.1145\/3478384"],"URL":"https:\/\/doi.org\/10.1145\/3478384.3478423","relation":{},"subject":[],"published":{"date-parts":[[2021,9]]}}}