{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:09:49Z","timestamp":1750219789712,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":21,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,11]],"date-time":"2023-08-11T00:00:00Z","timestamp":1691712000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,11]]},"DOI":"10.1145\/3617695.3617728","type":"proceedings-article","created":{"date-parts":[[2023,11,2]],"date-time":"2023-11-02T22:10:15Z","timestamp":1698963015000},"page":"28-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["An Attention-based Audio-visual Fusion Method for Short Video Classification"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-2131-9027","authenticated-orcid":false,"given":"Hong Liang","family":"Dai","sequence":"first","affiliation":[{"name":"Yang Zhou University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6843-4276","authenticated-orcid":false,"given":"Xingfeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Yang Zhou University, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0962-2950","authenticated-orcid":false,"given":"Haiyang","family":"Yu","sequence":"additional","affiliation":[{"name":"Yang Zhou University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,11,2]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.31681\/jetol.457046"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3233\/JIFS-169369"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.23884\/ejt.2017.7.2.11"},{"key":"e_1_3_2_1_4_1","unstructured":"Hospedales T. Antoniou A. Micaelli P. and Storkey A. 2021. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence 44(9): 5149-5169. http:\/\/10.1109\/TPAMI.2021.3079209  Hospedales T. Antoniou A. Micaelli P. and Storkey A. 2021. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence 44(9): 5149-5169. http:\/\/10.1109\/TPAMI.2021.3079209"},{"key":"e_1_3_2_1_5_1","unstructured":"Xiangrong Z. and Fang L. 2002. A pattern classification method based on GA and SVM. Paper presented at the 6th International Conference on Signal Processing 2002.  Xiangrong Z. and Fang L. 2002. A pattern classification method based on GA and SVM. Paper presented at the 6th International Conference on Signal Processing 2002."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.17706\/IJCEE.2016.8.2.177-184"},{"key":"e_1_3_2_1_7_1","volume-title":"IFAC Proceedings Volumes, 46(30)","author":"Blondel P.","year":"2013","unstructured":"Blondel , P. , Potelle , A. , P\u00e9gard , P. C. and Lozano , P. R . 2013. How to improve the HOG detector in the UAV context . IFAC Proceedings Volumes, 46(30) : 46-51. http:\/\/10.3182\/ 2013 1120-3-FR-4045.00009 Blondel, P., Potelle, A., P\u00e9gard, P. C. and Lozano, P. R. 2013. How to improve the HOG detector in the UAV context. IFAC Proceedings Volumes, 46(30): 46-51. http:\/\/10.3182\/20131120-3-FR-4045.00009"},{"key":"e_1_3_2_1_8_1","unstructured":"Tomasi C. 2012. Histograms of oriented gradients. Computer Vision Sampler: 1-6  Tomasi C. 2012. Histograms of oriented gradients. Computer Vision Sampler: 1-6"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Long X. Gan C. De Melo G. Wu J. Liu X. and Wen S. 2018. Attention clusters: Purely attention based local feature integration for video classification. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition. http:\/\/10.1109\/CVPR.2018.00817  Long X. Gan C. De Melo G. Wu J. Liu X. and Wen S. 2018. Attention clusters: Purely attention based local feature integration for video classification. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition. http:\/\/10.1109\/CVPR.2018.00817","DOI":"10.1109\/CVPR.2018.00817"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2018.09.003"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Tran D. Bourdev L. Fergus R. Torresani L. and Paluri M. 2015. Learning spatiotemporal features with 3d convolutional networks. Paper presented at the Proceedings of the IEEE international conference on computer vision. http:\/\/10.1109\/ICCV.2015.510  Tran D. Bourdev L. Fergus R. Torresani L. and Paluri M. 2015. Learning spatiotemporal features with 3d convolutional networks. Paper presented at the Proceedings of the IEEE international conference on computer vision. http:\/\/10.1109\/ICCV.2015.510","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Carreira J. and Zisserman A. 2017. Quo vadis action recognition? a new model and the kinetics dataset. Paper presented at the proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.  Carreira J. and Zisserman A. 2017. Quo vadis action recognition? a new model and the kinetics dataset. Paper presented at the proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Hara K. Kataoka H. and Satoh Y. 2018. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? Paper presented at the Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.  Hara K. Kataoka H. and Satoh Y. 2018. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? Paper presented at the Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPR.2018.00685"},{"key":"e_1_3_2_1_14_1","unstructured":"Opitz J. and Burst S. 2019. Macro f1 and macro f1. arXiv preprint arXiv:1911.03347  Opitz J. and Burst S. 2019. Macro f1 and macro f1. arXiv preprint arXiv:1911.03347"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Lin T.-Y. Goyal P. Girshick R. He K. and Doll\u00e1r P. 2017. Focal loss for dense object detection. Paper presented at the Proceedings of the IEEE international conference on computer vision. http:\/\/10.1109\/TPAMI.2018.2858826  Lin T.-Y. Goyal P. Girshick R. He K. and Doll\u00e1r P. 2017. Focal loss for dense object detection. Paper presented at the Proceedings of the IEEE international conference on computer vision. http:\/\/10.1109\/TPAMI.2018.2858826","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.3390\/s18124308"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Jiang Y.-G. Ye G. Chang S.-F. Ellis D. and Loui A. C. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. Paper presented at the Proceedings of the 1st ACM International Conference on Multimedia Retrieval. http:\/\/10.1145\/1991996.1992025  Jiang Y.-G. Ye G. Chang S.-F. Ellis D. and Loui A. C. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. Paper presented at the Proceedings of the 1st ACM International Conference on Multimedia Retrieval. http:\/\/10.1145\/1991996.1992025","DOI":"10.1145\/1991996.1992025"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Fonseca E. Plakal M. Font F. Ellis D. P. and Serra X. 2019. Audio tagging with noisy labels and minimal supervision. arXiv preprint arXiv:1906.02975  Fonseca E. Plakal M. Font F. Ellis D. P. and Serra X. 2019. Audio tagging with noisy labels and minimal supervision. arXiv preprint arXiv:1906.02975","DOI":"10.33682\/w13e-5v06"},{"key":"e_1_3_2_1_19_1","unstructured":"Lin M. Chen Q. and Yan S. 2013. Network in network. arXiv preprint arXiv:1312.4400  Lin M. Chen Q. and Yan S. 2013. Network in network. arXiv preprint arXiv:1312.4400"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C. Fan H. Malik J. and He K. 2019. Slowfast networks for video recognition. Paper presented at the Proceedings of the IEEE\/CVF international conference on computer vision. http:\/\/10.1109\/ICCV.2019.00630  Feichtenhofer C. Fan H. Malik J. and He K. 2019. Slowfast networks for video recognition. Paper presented at the Proceedings of the IEEE\/CVF international conference on computer vision. http:\/\/10.1109\/ICCV.2019.00630","DOI":"10.1109\/ICCV.2019.00630"},{"key":"e_1_3_2_1_21_1","unstructured":"Li Y. Wu C.-Y. Fan H. Mangalam K. Xiong B. Malik J. and Feichtenhofer C. 2021. Improved multiscale vision transformers for classification and detection. arXiv preprint arXiv:2112.01526. http:\/\/10.1109\/CVPR52688.2022.00476  Li Y. Wu C.-Y. Fan H. Mangalam K. Xiong B. Malik J. and Feichtenhofer C. 2021. Improved multiscale vision transformers for classification and detection. arXiv preprint arXiv:2112.01526. http:\/\/10.1109\/CVPR52688.2022.00476"}],"event":{"name":"BDIOT 2023: 2023 7th International Conference on Big Data and Internet of Things","acronym":"BDIOT 2023","location":"Beijing China"},"container-title":["Proceedings of the 2023 7th International Conference on Big Data and Internet of Things"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617695.3617728","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3617695.3617728","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:57Z","timestamp":1750178277000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617695.3617728"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,11]]},"references-count":21,"alternative-id":["10.1145\/3617695.3617728","10.1145\/3617695"],"URL":"https:\/\/doi.org\/10.1145\/3617695.3617728","relation":{},"subject":[],"published":{"date-parts":[[2023,8,11]]},"assertion":[{"value":"2023-11-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}