{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T18:38:17Z","timestamp":1768675097168,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,19]],"date-time":"2020-06-19T00:00:00Z","timestamp":1592524800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,19]]},"DOI":"10.1145\/3408127.3408137","type":"proceedings-article","created":{"date-parts":[[2020,9,11]],"date-time":"2020-09-11T03:31:13Z","timestamp":1599795073000},"page":"155-159","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["Music Genre Classification with Transformer Classifier"],"prefix":"10.1145","author":[{"given":"Yingying","family":"Zhuang","sequence":"first","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"given":"Yuezhang","family":"Chen","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"given":"Jie","family":"Zheng","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2020,9,10]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Lostanlen V. and Cella C. E. 2016. Deep convolutional networks on the pitch spiral for musical instrument recognition. arXiv preprint arXiv:1605.06644.  Lostanlen V. and Cella C. E. 2016. Deep convolutional networks on the pitch spiral for musical instrument recognition. arXiv preprint arXiv:1605.06644."},{"key":"e_1_3_2_1_2_1","unstructured":"Choi K. Fazekas G. and Sandler M. 2016. Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298.  Choi K. Fazekas G. and Sandler M. 2016. Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298."},{"key":"e_1_3_2_1_3_1","unstructured":"Dorfer M. Arzt A. B\u00f6ck S. Durand A. and Widmer G. 2016. Live score following on sheet music images. arXiv preprint arXiv:1612.05076.  Dorfer M. Arzt A. B\u00f6ck S. Durand A. and Widmer G. 2016. Live score following on sheet music images. arXiv preprint arXiv:1612.05076."},{"key":"e_1_3_2_1_4_1","volume-title":"Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks. In ISMIR (August","author":"Kum S.","year":"2016","unstructured":"Kum , S. , Oh , C. , and Nam , J . 2016 . Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks. In ISMIR (August , 2016 ), 819--825. Kum, S., Oh, C., and Nam, J. 2016. Melody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks. In ISMIR (August, 2016), 819--825."},{"key":"e_1_3_2_1_5_1","unstructured":"Li P. Qian J. and Wang T. 2015. Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv preprint arXiv:1511.05520. R. Mayer R.  Li P. Qian J. and Wang T. 2015. Automatic instrument recognition in polyphonic music using convolutional neural networks. arXiv preprint arXiv:1511.05520. R. Mayer R."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1459359.1459382"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1816123.1816146"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/EUSIPCO.2015.7362591"},{"key":"e_1_3_2_1_9_1","volume-title":"Learning Sparse Feature Representations for Music Annotation and Retrieval. In ISMIR (October","author":"Nam J.","year":"2012","unstructured":"Nam , J. , Herrera , J. , Slaney , M. , and Smith , J. O . 2012 . Learning Sparse Feature Representations for Music Annotation and Retrieval. In ISMIR (October , 2012 ), 565--570. Nam, J., Herrera, J., Slaney, M., and Smith, J. O. 2012. Learning Sparse Feature Representations for Music Annotation and Retrieval. In ISMIR (October, 2012), 565--570."},{"key":"e_1_3_2_1_10_1","volume-title":"Symbolic and Cultural Features. In ISMIR (August","author":"McKay C.","year":"2010","unstructured":"McKay , C. , Burgoyne , J. A. , Hockman , J. , Smith , J. B. , Vigliensoni , G. , and Fujinaga , I . 2010. Evaluating the Genre Classification Performance of Lyrical Features Relative to Audio , Symbolic and Cultural Features. In ISMIR (August , 2010 ), 213--218. McKay, C., Burgoyne, J. A., Hockman, J., Smith, J. B., Vigliensoni, G., and Fujinaga, I. 2010. Evaluating the Genre Classification Performance of Lyrical Features Relative to Audio, Symbolic and Cultural Features. In ISMIR (August, 2010), 213--218."},{"key":"e_1_3_2_1_11_1","volume-title":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June","author":"Senac C.","year":"2017","unstructured":"Senac , C. , Pellegrini , T. , Mouret , F. , and Pinquier , J . 2017. Music feature maps with convolutional neural networks for music genre classification . In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June , 2017 ), 19. Senac, C., Pellegrini, T., Mouret, F., and Pinquier, J. 2017. Music feature maps with convolutional neural networks for music genre classification. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June, 2017), 19."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Fu Z. Lu G. Ting K. M. and Zhang D. 2010. A survey of audio-based music classification and annotation. IEEE transactions on multimedia 13(2) 303--319.  Fu Z. Lu G. Ting K. M. and Zhang D. 2010. A survey of audio-based music classification and annotation. IEEE transactions on multimedia 13(2) 303--319.","DOI":"10.1109\/TMM.2010.2098858"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Bergstra J. Casagrande N. Erhan D. Eck D. and K\u00e9gl B. 2006. Aggregate features and a da b oost for music classification. Machine learning 65(2-3) 473--484.  Bergstra J. Casagrande N. Erhan D. Eck D. and K\u00e9gl B. 2006. Aggregate features and a da b oost for music classification. Machine learning 65(2-3) 473--484.","DOI":"10.1007\/s10994-006-9019-7"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (July","author":"Li T.","year":"2013","unstructured":"Li , T. , Ogihara , M. , and Li , Q . 2003. A comparative study on content-based music genre classification . In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (July , 2013 ), 282--289. Li, T., Ogihara, M., and Li, Q. 2003. A comparative study on content-based music genre classification. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (July, 2013), 282--289."},{"key":"e_1_3_2_1_15_1","unstructured":"Lee H. Pham P. Largman Y. and Ng A. Y. 2009. Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in neural information processing systems 1096--1104.  Lee H. Pham P. Largman Y. and Ng A. Y. 2009. Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in neural information processing systems 1096--1104."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854949"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952585"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7177944"},{"key":"e_1_3_2_1_19_1","unstructured":"Kim Y. Denton C. Hoang L. and Rush A. M. 2017. Structured attention networks. arXiv preprint arXiv:1702.00887.  Kim Y. Denton C. Hoang L. and Rush A. M. 2017. Structured attention networks. arXiv preprint arXiv:1702.00887."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Cheng J. Dong L. and Lapata M. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.  Cheng J. Dong L. and Lapata M. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.","DOI":"10.18653\/v1\/D16-1053"},{"key":"e_1_3_2_1_21_1","unstructured":"Paulus R. Xiong C. and Socher R. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.  Paulus R. Xiong C. and Socher R. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304."},{"key":"e_1_3_2_1_22_1","unstructured":"Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A. N. and Polosukhin I. 2017. Attention is all you need. In Advances in neural information processing systems 5998-6008).  Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A. N. and Polosukhin I. 2017. Attention is all you need. In Advances in neural information processing systems 5998-6008)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.2307\/3679550"},{"key":"e_1_3_2_1_24_1","unstructured":"Choi K. Fazekas G. Cho K. and Sandler M. 2017. A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396.  Choi K. Fazekas G. Cho K. and Sandler M. 2017. A tutorial on deep learning for music information retrieval. arXiv preprint arXiv:1709.04396."},{"key":"e_1_3_2_1_25_1","volume-title":"Sixteenth Annual Conference of the International Speech Communication Association.","author":"Sainath T. N.","unstructured":"Sainath , T. N. , Weiss , R. J. , Senior , A. , Wilson , K. W. , and Vinyals , O . 2015. Learning the speech front-end with raw waveform CLDNNs . In Sixteenth Annual Conference of the International Speech Communication Association. Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K. W., and Vinyals, O. 2015. Learning the speech front-end with raw waveform CLDNNs. In Sixteenth Annual Conference of the International Speech Communication Association."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854950"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Sejdi\u0107 E. Djurovi\u0107 I. and Jiang J. 2009. Time--frequency feature representation using energy concentration: An overview of recent advances. Digital signal processing 19(1) 153--183.  Sejdi\u0107 E. Djurovi\u0107 I. and Jiang J. 2009. Time--frequency feature representation using energy concentration: An overview of recent advances. Digital signal processing 19(1) 153--183.","DOI":"10.1016\/j.dsp.2007.12.004"},{"key":"e_1_3_2_1_28_1","series-title":"Vol. 2","volume-title":"Information retrieval for music and motion","author":"M\u00fcller M.","unstructured":"M\u00fcller , M. 2007. Information retrieval for music and motion ( Vol. 2 ) . Heidelberg : Springer , 65. M\u00fcller, M. 2007. Information retrieval for music and motion (Vol. 2). Heidelberg: Springer, 65."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-22482-4_50"},{"key":"e_1_3_2_1_30_1","volume-title":"ISMIR (October","author":"Huang P. S.","year":"2014","unstructured":"Huang , P. S. , Kim , M. , Hasegawa-Johnson , M. , and Smaragdis , P . 2014. Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks . In ISMIR (October , 2014 ), 477--482. Huang, P. S., Kim, M., Hasegawa-Johnson, M., and Smaragdis, P. 2014. Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks. In ISMIR (October, 2014), 477--482."},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the International Society for Music Information Retrieval Conference (October","author":"Choi K.","year":"2015","unstructured":"Choi , K. , Fazekas , G. , Sandler , M. , and Kim , J . 2015. Auralisation of deep convolutional neural networks: Listening to learned features . In Proceedings of the International Society for Music Information Retrieval Conference (October , 2015 ). Malaga, Spain, 26--30. Choi, K., Fazekas, G., Sandler, M., and Kim, J. 2015. Auralisation of deep convolutional neural networks: Listening to learned features. In Proceedings of the International Society for Music Information Retrieval Conference (October, 2015). Malaga, Spain, 26--30."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1915893"},{"key":"e_1_3_2_1_33_1","volume-title":"An introduction to the psychology of hearing","author":"Moore B. C.","unstructured":"Moore , B. C. 2012. An introduction to the psychology of hearing . Brill . Moore, B. C. 2012. An introduction to the psychology of hearing. Brill."},{"key":"e_1_3_2_1_34_1","volume-title":"14th International Society for Music Information Retrieval Conference (ISMIR-2013)","author":"Dieleman S.","unstructured":"Dieleman , S. , and Schrauwen , B . 2013. Multiscale approaches to music audio feature learning . In 14th International Society for Music Information Retrieval Conference (ISMIR-2013) , 116--121. Dieleman, S., and Schrauwen, B. 2013. Multiscale approaches to music audio feature learning. In 14th International Society for Music Information Retrieval Conference (ISMIR-2013), 116--121."},{"key":"e_1_3_2_1_35_1","unstructured":"Van den Oord A. Dieleman S. and Schrauwen B. 2013. Deep content-based music recommendation. In Advances in neural information processing systems 2643--2651.  Van den Oord A. Dieleman S. and Schrauwen B. 2013. Deep content-based music recommendation. In Advances in neural information processing systems 2643--2651."},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June","author":"Senac C.","year":"2017","unstructured":"Senac , C. , Pellegrini , T. , Mouret , F. , and Pinquier , J . 2017. Music feature maps with convolutional neural networks for music genre classification . In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June , 2017 ), 19 Senac, C., Pellegrini, T., Mouret, F., and Pinquier, J. 2017. Music feature maps with convolutional neural networks for music genre classification. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June, 2017), 19"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Bergstra J. Casagrande N. Erhan D. Eck D. and K\u00e9gl B. 2006. Aggregate features and ADABOOST for music classification. Machine learning 65(2-3) 473--484.  Bergstra J. Casagrande N. Erhan D. Eck D. and K\u00e9gl B. 2006. Aggregate features and ADABOOST for music classification. Machine learning 65(2-3) 473--484.","DOI":"10.1007\/s10994-006-9019-7"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2016.0078"},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June","author":"Senac C.","year":"2017","unstructured":"Senac , C. , Pellegrini , T. , Mouret , F. , and Pinquier , J . 2017. Music feature maps with convolutional neural networks for music genre classification . In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June , 2017 ), 19 Senac, C., Pellegrini, T., Mouret, F., and Pinquier, J. 2017. Music feature maps with convolutional neural networks for music genre classification. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing (June, 2017), 19"},{"key":"e_1_3_2_1_40_1","unstructured":"de Eguino M. F. R. 2016. Deep Music Genre.  de Eguino M. F. R. 2016. Deep Music Genre."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2002.800560"}],"event":{"name":"ICDSP 2020: 2020 4th International Conference on Digital Signal Processing","location":"Chengdu China","acronym":"ICDSP 2020","sponsor":["University of Electronic Science and Technology of China University of Electronic Science and Technology of China"]},"container-title":["Proceedings of the 2020 4th International Conference on Digital Signal Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3408127.3408137","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3408127.3408137","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:34Z","timestamp":1750197694000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3408127.3408137"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,19]]},"references-count":41,"alternative-id":["10.1145\/3408127.3408137","10.1145\/3408127"],"URL":"https:\/\/doi.org\/10.1145\/3408127.3408137","relation":{},"subject":[],"published":{"date-parts":[[2020,6,19]]},"assertion":[{"value":"2020-09-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}