{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,25]],"date-time":"2026-06-25T02:02:07Z","timestamp":1782352927400,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,6,5]],"date-time":"2018-06-05T00:00:00Z","timestamp":1528156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,6,5]]},"DOI":"10.1145\/3206025.3206046","type":"proceedings-article","created":{"date-parts":[[2018,6,11]],"date-time":"2018-06-11T12:36:20Z","timestamp":1528720580000},"page":"353-361","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":40,"title":["CBVMR"],"prefix":"10.1145","author":[{"given":"Sungeun","family":"Hong","sequence":"first","affiliation":[{"name":"SK T-Brain, Seoul, South Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Woobin","family":"Im","sequence":"additional","affiliation":[{"name":"Korea Advanced Institute of Science and Technology, Daejeon, South Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hyun S.","family":"Yang","sequence":"additional","affiliation":[{"name":"Korea Advanced Institute of Science and Technology, Daejeon, South Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2018,6,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675","author":"Abu-El-Haija Sami","year":"2016","unstructured":"Sami Abu-El-Haija , Nisarg Kothari , Joonseok Lee , Paul Natsev , George Toderici , Balakrishnan Varadarajan , and Sudheendra Vijayanarasimhan . 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 ( 2016 ). Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016)."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-04114-8_26"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00138-013-0505-1"},{"key":"e_1_3_2_1_4_1","volume-title":"Nando De Freitas, and Kejie Bao","author":"Brochu Eric","year":"2003","unstructured":"Eric Brochu , Nando De Freitas, and Kejie Bao . 2003 . The sound of an album cover: Probabilistic multimedia and IR Workshop on Artificial Intelligence and Statistics . Eric Brochu, Nando De Freitas, and Kejie Bao. 2003. The sound of an album cover: Probabilistic multimedia and IR Workshop on Artificial Intelligence and Statistics."},{"key":"e_1_3_2_1_5_1","volume-title":"ISWC2011 (October","author":"Chao Jiansong","year":"2011","unstructured":"Jiansong Chao , Haofen Wang , Wenlei Zhou , Weinan Zhang , and Yong Yu . 2011 . Tunesensor: A semantic-driven music recommendation service for digital photo albums Proceedings of the 10th International Semantic Web Conference . ISWC2011 (October 2011). Jiansong Chao, Haofen Wang, Wenlei Zhou, Weinan Zhang, and Yong Yu. 2011. Tunesensor: A semantic-driven music recommendation service for digital photo albums Proceedings of the 10th International Semantic Web Conference. ISWC2011 (October 2011)."},{"key":"e_1_3_2_1_6_1","volume-title":"CVPR 2009. IEEE Conference on. IEEE, 248--255","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009 . CVPR 2009. IEEE Conference on. IEEE, 248--255 . Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255."},{"key":"e_1_3_2_1_7_1","volume-title":"et almbox","author":"Frome Andrea","year":"2013","unstructured":"Andrea Frome , Greg S Corrado , Jon Shlens , Samy Bengio , Jeff Dean , Tomas Mikolov , et almbox .. 2013 . Devise : A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129. Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et almbox.. 2013. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2010.2098858"},{"key":"e_1_3_2_1_9_1","first-page":"1","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin Yaroslav","year":"2016","unstructured":"Yaroslav Ganin , Evgeniya Ustinova , Hana Ajakan , Pascal Germain , Hugo Larochelle , Franccois Laviolette , Mario Marchand , and Victor Lempitsky . 2016 . Domain-adversarial training of neural networks . Journal of Machine Learning Research Vol. 17 , 59 (2016), 1 -- 35 . Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Franccois Laviolette, Mario Marchand, and Victor Lempitsky . 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research Vol. 17, 59 (2016), 1--35.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321814"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_12_1","volume-title":"2017 a. SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person. arXiv preprint arXiv:1702.04069","author":"Hong Sungeun","year":"2017","unstructured":"Sungeun Hong , Woobin Im , Jongbin Ryu , and Hyun S Yang . 2017 a. SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person. arXiv preprint arXiv:1702.04069 ( 2017 ). Sungeun Hong, Woobin Im, Jongbin Ryu, and Hyun S Yang. 2017 a. SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person. arXiv preprint arXiv:1702.04069 (2017)."},{"key":"e_1_3_2_1_13_1","volume-title":"2017 b. Recognizing Dynamic Scenes with Deep Dual Descriptor based on Key Frames and Key Segments. arXiv preprint arXiv:1702.04479","author":"Hong Sungeun","year":"2017","unstructured":"Sungeun Hong , Jongbin Ryu , Woobin Im , and Hyun S Yang . 2017 b. Recognizing Dynamic Scenes with Deep Dual Descriptor based on Key Frames and Key Segments. arXiv preprint arXiv:1702.04479 ( 2017 ). Sungeun Hong, Jongbin Ryu, Woobin Im, and Hyun S Yang. 2017 b. Recognizing Dynamic Scenes with Deep Dual Descriptor based on Key Frames and Key Segments. arXiv preprint arXiv:1702.04479 (2017)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11045-016-0463-7"},{"key":"e_1_3_2_1_15_1","volume-title":"Principal component analysis","author":"Jolliffe Ian","unstructured":"Ian Jolliffe . 2002. Principal component analysis . Wiley Online Library . Ian Jolliffe. 2002. Principal component analysis. Wiley Online Library."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128--3137.  Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128--3137.","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_17_1","unstructured":"Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping Advances in neural information processing systems. 1889--1897.   Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping Advances in neural information processing systems. 1889--1897."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1186\/1687-4722-2013-15"},{"key":"e_1_3_2_1_19_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_20_1","volume-title":"Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv 1411.2539","author":"Kiros Ryan","year":"2014","unstructured":"Ryan Kiros , Ruslan Salakhutdinov , and Richard S Zemel . 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv 1411.2539 ( 2014 ). Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv 1411.2539 (2014)."},{"key":"e_1_3_2_1_21_1","volume-title":"Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks The International Society for Music Information Retrieval (ISMIR)","author":"Kum Sangeun","year":"2016","unstructured":"Sangeun Kum , Changheun Oh , and Juhan Nam . 2016. Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks The International Society for Music Information Retrieval (ISMIR) , 2016 . ISMIR. Sangeun Kum, Changheun Oh, and Juhan Nam. 2016. Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks The International Society for Music Information Retrieval (ISMIR), 2016. ISMIR."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2011.1"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2393347.2396496"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_25_1","volume-title":"Analysing the similarity of album art with self-organising maps International Workshop on Self-Organizing Maps","author":"Mayer Rudolf","unstructured":"Rudolf Mayer . 2011. Analysing the similarity of album art with self-organising maps International Workshop on Self-Organizing Maps . Springer , 357--366. Rudolf Mayer. 2011. Analysing the similarity of album art with self-organising maps International Workshop on Self-Organizing Maps. Springer, 357--366."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011","author":"M\u00fcller Meinard","year":"2011","unstructured":"Meinard M\u00fcller and Sebastian Ewert . 2011 . Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features . In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011 . hal-00727791, version 2--22 Oct 2012. Citeseer. Meinard M\u00fcller and Sebastian Ewert. 2011. Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011. hal-00727791, version 2--22 Oct 2012. Citeseer."},{"key":"e_1_3_2_1_27_1","volume-title":"Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation. arXiv preprint arXiv:1612.08354","author":"Park Gwangbeen","year":"2016","unstructured":"Gwangbeen Park and Woobin Im. 2016. Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation. arXiv preprint arXiv:1612.08354 ( 2016 ). Gwangbeen Park and Woobin Im. 2016. Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation. arXiv preprint arXiv:1612.08354 (2016)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2419081"},{"key":"e_1_3_2_1_29_1","volume-title":"Affective Music Recommendation System Based on the Mood of Input Video International Conference on Multimedia Modeling. Springer, 299--302","author":"Sasaki Shoto","year":"2015","unstructured":"Shoto Sasaki , Tatsunori Hirai , Hayato Ohya , and Shigeo Morishima . 2015 . Affective Music Recommendation System Based on the Mood of Input Video International Conference on Multimedia Modeling. Springer, 299--302 . Shoto Sasaki, Tatsunori Hirai, Hayato Ohya, and Shigeo Morishima. 2015. Affective Music Recommendation System Based on the Mood of Input Video International Conference on Multimedia Modeling. Springer, 299--302."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-16354-3_8"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654919"},{"key":"e_1_3_2_1_32_1","volume-title":"Kernel methods for pattern analysis","author":"Shawe-Taylor John","unstructured":"John Shawe-Taylor and Nello Cristianini . 2004. Kernel methods for pattern analysis . Cambridge university press . John Shawe-Taylor and Nello Cristianini. 2004. Kernel methods for pattern analysis. Cambridge university press."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_1_34_1","unstructured":"Aaron Van den Oord Sander Dieleman and Benjamin Schrauwen. 2013. Deep content-based music recommendation. In Advances in neural information processing systems. 2643--2651.   Aaron Van den Oord Sander Dieleman and Benjamin Schrauwen. 2013. Deep content-based music recommendation. In Advances in neural information processing systems. 2643--2651."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Liwei Wang Yin Li and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013.  Liwei Wang Yin Li and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013.","DOI":"10.1109\/CVPR.2016.541"},{"key":"e_1_3_2_1_36_1","volume-title":"Learning Two-Branch Neural Networks for Image-Text Matching Tasks. arXiv preprint arXiv:1704.03470","author":"Wang Liwei","year":"2017","unstructured":"Liwei Wang , Yin Li , and Svetlana Lazebnik . 2017. Learning Two-Branch Neural Networks for Image-Text Matching Tasks. arXiv preprint arXiv:1704.03470 ( 2017 ). Liwei Wang, Yin Li, and Svetlana Lazebnik. 2017. Learning Two-Branch Neural Networks for Image-Text Matching Tasks. arXiv preprint arXiv:1704.03470 (2017)."},{"key":"e_1_3_2_1_37_1","volume-title":"mbox","author":"Wegelin Jacob A","year":"2000","unstructured":"Jacob A Wegelin mbox .. 2000 . A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. University of Washington, Department of Statistics , Tech. Rep (2000). Jacob A Wegelin et almbox.. 2000. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. University of Washington, Department of Statistics, Tech. Rep (2000)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/1577069.1577078"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5591\/978-1-57735-516-8\/IJCAI11-460"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2557722"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"}],"event":{"name":"ICMR '18: International Conference on Multimedia Retrieval","location":"Yokohama Japan","acronym":"ICMR '18","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3206025.3206046","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3206025.3206046","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:08:15Z","timestamp":1750208895000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3206025.3206046"}},"subtitle":["Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint"],"short-title":[],"issued":{"date-parts":[[2018,6,5]]},"references-count":41,"alternative-id":["10.1145\/3206025.3206046","10.1145\/3206025"],"URL":"https:\/\/doi.org\/10.1145\/3206025.3206046","relation":{},"subject":[],"published":{"date-parts":[[2018,6,5]]},"assertion":[{"value":"2018-06-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}