{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:22:44Z","timestamp":1750220564403,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":23,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,5,28]],"date-time":"2021-05-28T00:00:00Z","timestamp":1622160000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,5,28]]},"DOI":"10.1145\/3469213.3470254","type":"proceedings-article","created":{"date-parts":[[2021,8,19]],"date-time":"2021-08-19T04:06:33Z","timestamp":1629345993000},"page":"1-5","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Audio-Visual Salieny Network with Audio Attention Module"],"prefix":"10.1145","author":[{"given":"Shuaiyang","family":"Cheng","sequence":"first","affiliation":[{"name":"Xiamen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xing","family":"Gao","sequence":"additional","affiliation":[{"name":"Xiamen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liang","family":"Song","sequence":"additional","affiliation":[{"name":"Xiamen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianbing","family":"Xiahou","sequence":"additional","affiliation":[{"name":"Xiamen University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,8,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00482"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00144"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2015.7301396"},{"issue":"4","key":"e_1_3_2_1_4_1","first-page":"565","article-title":"Comparison of local feature extraction paradigms applied to visual slam","volume":"20","author":"Trujillo L.","year":"2016","unstructured":"L\u00f3pez-L\u00f3pez, L. Trujillo , P. Legrand , V. D\u00edaz-Ram\u00edrez and G. Olague , \u201c Comparison of local feature extraction paradigms applied to visual slam ,\u201d Computacion y Sistemas , vol. 20 , no. 4 , pp. 565 - 588 , 2016 . L\u00f3pez-L\u00f3pez, L. Trujillo, P. Legrand, V. D\u00edaz-Ram\u00edrez and G. Olague, \u201cComparison of local feature extraction paradigms applied to visual slam,\u201d Computacion y Sistemas, vol. 20, no. 4, pp. 565-588, 2016.","journal-title":"Computacion y Sistemas"},{"key":"e_1_3_2_1_5_1","volume-title":"Attention is all you need","author":"Vaswani A.","year":"2017","unstructured":"A. Vaswani , N. Shazeer , N. Parmar , J. Uszkoreit , L. Jones , Aidan N. Gomez , \u201c Attention is all you need ,\u201d 2017 , arXiv: 1706.03762. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, Aidan N. Gomez , \u201cAttention is all you need,\u201d 2017, arXiv:1706.03762."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01047"},{"key":"e_1_3_2_1_7_1","volume-title":"DAVE: A deep audio-visual embedding for dynamic saliency prediction","author":"Tavakoli H.","year":"2019","unstructured":"H. R. Tavakoli , A. Borji , E. Rahtu , and J. Kannala , \u201c DAVE: A deep audio-visual embedding for dynamic saliency prediction ,\u201d 2019 , arXiv: 1905.10693. H. R.Tavakoli, A. Borji, E. Rahtu, and J. Kannala, \u201cDAVE: A deep audio-visual embedding for dynamic saliency prediction,\u201d 2019, arXiv:1905.10693."},{"key":"e_1_3_2_1_8_1","volume-title":"AViNet: Diving deep into audio-visual saliency prediction","author":"Jain S.","year":"2020","unstructured":"S. Jain , P. Yarlagadda , R. Subramanian , V. Gandhi , \u201c AViNet: Diving deep into audio-visual saliency prediction ,\u201d 2020 , arXiv: 2012.06170. S. Jain, P. Yarlagadda, R. Subramanian, V. Gandhi, \u201cAViNet: Diving deep into audio-visual saliency prediction,\u201d 2020, arXiv:2012.06170."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11928"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_2_1_12_1","volume-title":"Attention-based models for speech recognition","author":"Chorowski J.","year":"2015","unstructured":"J. Chorowski , D. Bahdanau , D. Serdyuk , K. Cho and Y. Bengio , \u201c Attention-based models for speech recognition ,\u201d 2015 , arXiv: 1506.07503. J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho and Y. Bengio, \u201cAttention-based models for speech recognition,\u201d 2015, arXiv:1506.07503."},{"key":"e_1_3_2_1_13_1","first-page":"6546","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition.","author":"Hara K.","year":"2018","unstructured":"K. Hara , H. Kataoka and Y. Satoh , \u201c Can spatial temporal 3d cnn retrace the history of 2d cnns and imagenet? ,\u201d in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018 , pp. 6546 - 6555 . K. Hara, H. Kataoka and Y. Satoh, \u201cCan spatial temporal 3d cnn retrace the history of 2d cnns and imagenet?,\u201d in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 6546-6555."},{"key":"e_1_3_2_1_14_1","volume-title":"Soundnet: Learning sound representations from unlabeled video","author":"Aytar Y.","year":"2016","unstructured":"Y. Aytar , C. Vondrick and A. Torralba , \u201c Soundnet: Learning sound representations from unlabeled video ,\u201d 2016 , arXiv: 1610.09001. Y. Aytar, C. Vondrick and A. Torralba, \u201cSoundnet: Learning sound representations from unlabeled video,\u201d 2016, arXiv:1610.09001."},{"key":"e_1_3_2_1_15_1","volume-title":"The kinetics human action video dataset","author":"Kay W.","year":"2018","unstructured":"W. Kay , J. Carreia , K. Simonyan , B. Zhang , C. Hillier , S. Vijayanarasimgan , , \u201c The kinetics human action video dataset ,\u201d 2018 , arXiv: 1705.06950. W. Kay, J. Carreia, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimgan, , \u201cThe kinetics human action video dataset,\u201d 2018, arXiv: 1705.06950."},{"volume-title":"What do different evaluation metrics tell us about saliency models?","author":"Bylinskii Z.","key":"e_1_3_2_1_16_1","unstructured":"Z. Bylinskii , T. Judd , A. Oliva , A. Torralba and F. Durand , \u201c What do different evaluation metrics tell us about saliency models? ,\u201d IEEE transactions on pattern analysis and machine intelligence, vol. 41 , no. 3, pp. 740-757, 2018. Z. Bylinskii, T. Judd, A. Oliva, A. Torralba and F. Durand, \u201cWhat do different evaluation metrics tell us about saliency models?,\u201d IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 3, pp. 740-757, 2018."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2996463"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-010-9074-z"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1167\/14.8.5"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4939-3435-5_16"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10584-0_33"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2019.05.001"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2015.08.004"}],"event":{"name":"ICAIIS 2021: 2021 2nd International Conference on Artificial Intelligence and Information Systems","acronym":"ICAIIS 2021","location":"Chongqing China"},"container-title":["2021 2nd International Conference on Artificial Intelligence and Information Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469213.3470254","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3469213.3470254","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:36Z","timestamp":1750195716000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469213.3470254"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,28]]},"references-count":23,"alternative-id":["10.1145\/3469213.3470254","10.1145\/3469213"],"URL":"https:\/\/doi.org\/10.1145\/3469213.3470254","relation":{},"subject":[],"published":{"date-parts":[[2021,5,28]]},"assertion":[{"value":"2021-08-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}