{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T21:44:35Z","timestamp":1743371075089},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Important Research Project of Hebei Province","award":["22370301D"],"award-info":[{"award-number":["22370301D"]}]},{"name":"Scientific Research Foundation of Hebei University for Distinguished Young Scholars","award":["521100221081"],"award-info":[{"award-number":["521100221081"]}]},{"name":"Scientific Research Foundation of Colleges and Universities in Hebei Province","award":["QN2022107"],"award-info":[{"award-number":["QN2022107"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Temporal action proposal generation in an untrimmed video is very challenging, and comprehensive context exploration is critically important to generate accurate candidates of action instances. This paper proposes a Temporal-aware Attention Network (TAN) that localizes context-rich proposals by enhancing the temporal representations of boundaries and proposals. Firstly, we pinpoint that obtaining precise location information of action instances needs to consider long-distance temporal contexts. To this end, we propose a Global-Aware Attention (GAA) module for boundary-level interaction. Specifically, we introduce two novel gating mechanisms into the top-down interaction structure to incorporate multi-level semantics into video features effectively. Secondly, we design an efficient task-specific Adaptive Temporal Interaction (ATI) module to learn proposal associations. TAN enhances proposal-level contextual representations in a wide range by utilizing multi-scale interaction modules. Extensive experiments on the ActivityNet-1.3 and THUMOS-14 demonstrate the effectiveness of our proposed method, e.g., TAN achieves 73.43% in AR@1000 on THUMOS-14 and 69.01% in AUC on ActivityNet-1.3. Moreover, TAN significantly improves temporal action detection performance when equipped with existing action classification frameworks.<\/jats:p>","DOI":"10.1007\/s40747-024-01343-0","type":"journal-article","created":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T18:02:40Z","timestamp":1708624960000},"page":"3691-3708","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["TAN: a temporal-aware attention network with context-rich representation for boosting proposal generation"],"prefix":"10.1007","volume":"10","author":[{"given":"Yanyan","family":"Jiao","sequence":"first","affiliation":[]},{"given":"Wenzhu","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Wenjie","family":"Xing","sequence":"additional","affiliation":[]},{"given":"Shuang","family":"Zeng","sequence":"additional","affiliation":[]},{"given":"Lei","family":"Geng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,22]]},"reference":[{"key":"1343_CR1","doi-asserted-by":"crossref","unstructured":"Arnab, A., et al., ViViT: A Video Vision Transformer, in IEEE\/CVF International Conference on Computer Vision. 2021. p. 6836\u20136846.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"1343_CR2","first-page":"121","volume-title":"European Conference on Computer Vision","author":"Y Bai","year":"2020","unstructured":"Bai Y et al (2020) Boundary content graph neural network for temporal action proposal generation. European Conference on Computer Vision. Springer, pp 121\u2013137"},{"key":"1343_CR3","unstructured":"Bertasius, G., H. Wang, and L. Torresani, Is Space-Time Attention All You Need for Video Understanding?, in International Conference on Machine Learning. 2021, PMLR. p. 813\u2013824."},{"key":"1343_CR4","unstructured":"Bochkovskiy, A., C.-Y. Wang, and H.-Y.M. Liao, Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020."},{"key":"1343_CR5","doi-asserted-by":"crossref","unstructured":"Caba Heilbron, F., et al., Activitynet: A large-scale video benchmark for human activity understanding, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2015. p. 961\u2013970.","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"1343_CR6","doi-asserted-by":"crossref","unstructured":"Chao, Y.-W., et al. Rethinking the faster r-cnn architecture for temporal action localization. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.","DOI":"10.1109\/CVPR.2018.00124"},{"key":"1343_CR7","doi-asserted-by":"crossref","unstructured":"Chen, G., et al., DCAN: Improving temporal action detection via dual context aggregation, in AAAI Conference on Artificial Intelligence. 2022. p. 248\u2013257.","DOI":"10.1609\/aaai.v36i1.19900"},{"issue":"10","key":"1343_CR8","doi-asserted-by":"publisher","first-page":"2723","DOI":"10.1109\/TMM.2019.2959977","volume":"22","author":"P Chen","year":"2019","unstructured":"Chen P et al (2019) Relation attention for temporal action localization. IEEE Trans Multimedia 22(10):2723\u20132733","journal-title":"IEEE Trans Multimedia"},{"key":"1343_CR9","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., A. Pinz, and A. Zisserman, Convolutional two-stream network fusion for video action recognition, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2016. p. 1933\u20131941.","DOI":"10.1109\/CVPR.2016.213"},{"key":"1343_CR10","doi-asserted-by":"crossref","unstructured":"Gao, J., K. Chen, and R. Nevatia, Ctap: Complementary temporal action proposal generation, in European conference on computer vision. 2018. p. 68\u201383.","DOI":"10.1007\/978-3-030-01216-8_5"},{"key":"1343_CR11","doi-asserted-by":"crossref","unstructured":"Gao, J., et al., Accurate temporal action proposal generation with relation-aware pyramid network, in AAAI Conference on Artificial Intelligence. 2020. p. 10810\u201310817.","DOI":"10.1609\/aaai.v34i07.6711"},{"key":"1343_CR12","doi-asserted-by":"crossref","unstructured":"Gao, J., et al., Turn tap: Temporal unit regression network for temporal action proposals, in IEEE\/CVF International Conference on Computer Vision. 2017. p. 3628\u20133636.","DOI":"10.1109\/ICCV.2017.392"},{"key":"1343_CR13","doi-asserted-by":"crossref","unstructured":"Gao, J., Z. Yang, and R. Nevatia, Cascaded boundary regression for temporal action detection. arXiv preprint arXiv:1705.01180, 2017.","DOI":"10.5244\/C.31.52"},{"key":"1343_CR14","doi-asserted-by":"crossref","unstructured":"Girdhar, R., et al. Video action transformer network. in Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 2019.","DOI":"10.1109\/CVPR.2019.00033"},{"key":"1343_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107267","volume":"103","author":"T Han","year":"2020","unstructured":"Han T et al (2020) TVENet: Temporal variance embedding network for fine-grained action representation. Pattern Recogn 103:107267","journal-title":"Pattern Recogn"},{"key":"1343_CR16","doi-asserted-by":"crossref","unstructured":"He, K., et al., Deep residual learning for image recognition, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2016. p. 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"1343_CR17","unstructured":"Huang, Z., et al., TAda! Temporally-Adaptive Convolutions for Video Understanding. arXiv preprint arXiv:2110.06178, 2021."},{"key":"1343_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cviu.2016.10.018","volume":"155","author":"H Idrees","year":"2017","unstructured":"Idrees H et al (2017) The THUMOS challenge on action recognition for videos \u201cin the wild.\u201d Comput Vis Image Underst 155:1\u201323","journal-title":"Comput Vis Image Underst"},{"key":"1343_CR19","unstructured":"Ioffe, S. and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in International Conference on Machine Learning. 2015, PMLR. p. 448\u2013456."},{"key":"1343_CR20","unstructured":"Jia, X., et al., Dynamic filter networks. Advances in neural information processing systems, 2016. 29."},{"key":"1343_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.118965","volume":"213","author":"P Li","year":"2023","unstructured":"Li P, Cao J, Ye X (2023) Prototype contrastive learning for point-supervised temporal action detection. Expert Syst Appl 213:118965","journal-title":"Expert Syst Appl"},{"key":"1343_CR22","unstructured":"Li, Y., et al., Revisiting dynamic convolution via matrix decomposition. arXiv preprint arXiv:2103.08756, 2021."},{"key":"1343_CR23","doi-asserted-by":"crossref","unstructured":"Lin, C., et al., Fast learning of temporal action proposal via dense boundary generator, in AAAI Conference on Artificial Intelligence. 2020. p. 11499\u201311506.","DOI":"10.1609\/aaai.v34i07.6815"},{"key":"1343_CR24","first-page":"3319","volume":"2021","author":"C Lin","year":"2021","unstructured":"Lin C et al (2021) Learning Salient Boundary Feature for Anchor-free Temporal Action Localization. IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021:3319\u20133328","journal-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"1343_CR25","doi-asserted-by":"crossref","unstructured":"Lin, T., et al., BMN: Boundary-Matching Network for Temporal Action Proposal Generation, in IEEE\/CVF International Conference on Computer Vision. 2019. p. 3888\u20133897.","DOI":"10.1109\/ICCV.2019.00399"},{"key":"1343_CR26","first-page":"3","volume-title":"European Conference on Computer Vision","author":"T Lin","year":"2018","unstructured":"Lin T et al (2018) BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. European Conference on Computer Vision. Springer, pp 3\u201321"},{"key":"1343_CR27","doi-asserted-by":"crossref","unstructured":"Liu, Q. and Z. Wang. Progressive boundary refinement network for temporal action detection. in Proceedings of the AAAI Conference on Artificial Intelligence. 2020.","DOI":"10.1609\/aaai.v34i07.6829"},{"key":"1343_CR28","first-page":"530","volume-title":"Asian Conference on Computer Vision","author":"S Liu","year":"2020","unstructured":"Liu S et al (2020) TSI: Temporal Scale Invariant Network for Action Proposal Generation. Asian Conference on Computer Vision. Springer, pp 530\u2013546"},{"key":"1343_CR29","doi-asserted-by":"crossref","unstructured":"Liu, Y., et al., Multi-Granularity Generator for Temporal Action Proposal, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 3604\u20133613.","DOI":"10.1109\/CVPR.2019.00372"},{"key":"1343_CR30","doi-asserted-by":"crossref","unstructured":"Liu, Z., et al., Tam: Temporal adaptive module for video recognition, in IEEE\/CVF International Conference on Computer Vision. 2021. p. 13708\u201313718.","DOI":"10.1109\/ICCV48922.2021.01345"},{"key":"1343_CR31","doi-asserted-by":"crossref","unstructured":"Long, F., et al., Gaussian temporal awareness networks for action localization, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 344\u2013353.","DOI":"10.1109\/CVPR.2019.00043"},{"key":"1343_CR32","doi-asserted-by":"crossref","unstructured":"Qing, Z., et al., Temporal context aggregation network for temporal action proposal refinement, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 485\u2013494.","DOI":"10.1109\/CVPR46437.2021.00055"},{"key":"1343_CR33","doi-asserted-by":"crossref","unstructured":"Qiu, Z., T. Yao, and T. Mei, Learning spatio-temporal representation with pseudo-3d residual networks, in IEEE\/CVF International Conference on Computer Vision. 2017. p. 5533\u20135541.","DOI":"10.1109\/ICCV.2017.590"},{"key":"1343_CR34","unstructured":"Redmon, J. and A. Farhadi, Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018."},{"issue":"6","key":"1343_CR35","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2016","unstructured":"Ren S et al (2016) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137\u20131149","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1343_CR36","doi-asserted-by":"crossref","unstructured":"Shou, Z., D. Wang, and S.-F. Chang, Temporal action localization in untrimmed videos via multi-stage cnns, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2016. p. 1049\u20131058.","DOI":"10.1109\/CVPR.2016.119"},{"key":"1343_CR37","doi-asserted-by":"crossref","unstructured":"Su, H., et al., Bsn++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation, in AAAI Conference on Artificial Intelligence. 2021. p. 2602\u20132610.","DOI":"10.1609\/aaai.v35i3.16363"},{"key":"1343_CR38","first-page":"558","volume-title":"Asian Conference on Computer Vision","author":"H Su","year":"2018","unstructured":"Su H, Zhao X, Lin T (2018) Cascaded pyramid mining network for weakly supervised temporal action localization. Asian Conference on Computer Vision. Springer, pp 558\u2013574"},{"key":"1343_CR39","doi-asserted-by":"publisher","first-page":"1503","DOI":"10.1109\/TMM.2020.2999184","volume":"23","author":"H Su","year":"2020","unstructured":"Su H et al (2020) Transferable knowledge-based multi-granularity fusion network for weakly supervised temporal action detection. IEEE Trans Multimedia 23:1503\u20131515","journal-title":"IEEE Trans Multimedia"},{"key":"1343_CR40","doi-asserted-by":"crossref","unstructured":"Tan, J., et al., Relaxed Transformer Decoders for Direct Action Proposal Generation, in IEEE\/CVF International Conference on Computer Vision. 2021. p. 13506\u201313515.","DOI":"10.1109\/ICCV48922.2021.01327"},{"key":"1343_CR41","doi-asserted-by":"crossref","unstructured":"Tran, D., et al., Learning spatiotemporal features with 3d convolutional networks, in IEEE\/CVF International Conference on Computer Vision. 2015. p. 4489\u20134497.","DOI":"10.1109\/ICCV.2015.510"},{"key":"1343_CR42","doi-asserted-by":"crossref","unstructured":"Tran, D., et al., A Closer Look at Spatiotemporal Convolutions for Action Recognition, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2018. p. 6450\u20136459.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"1343_CR43","unstructured":"Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30."},{"key":"1343_CR44","doi-asserted-by":"publisher","first-page":"126431","DOI":"10.1109\/ACCESS.2021.3110973","volume":"9","author":"K Vo","year":"2021","unstructured":"Vo K et al (2021) ABN: agent-aware boundary networks for temporal action proposal generation. IEEE Access 9:126431\u2013126445","journal-title":"IEEE Access"},{"key":"1343_CR45","doi-asserted-by":"crossref","unstructured":"Wang, L., et al., Untrimmednets for weakly supervised action recognition and detection, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2017. p. 4325\u20134334.","DOI":"10.1109\/CVPR.2017.678"},{"key":"1343_CR46","doi-asserted-by":"crossref","unstructured":"Wang, L., et al., Temporal segment networks: Towards good practices for deep action recognition, in European Conference on Computer Vision. 2016. p. 20\u201336.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"1343_CR47","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.01.045","volume":"538","author":"L Wang","year":"2023","unstructured":"Wang L et al (2023) MIFNet: Multiple instances focused temporal action proposal generation. Neurocomputing 538:126025","journal-title":"Neurocomputing"},{"key":"1343_CR48","doi-asserted-by":"crossref","unstructured":"Wang, X., et al. Skeleton-based action recognition via adaptive cross-form learning. in Proceedings of the 30th ACM International Conference on Multimedia. 2022.","DOI":"10.1145\/3503161.3547811"},{"key":"1343_CR49","doi-asserted-by":"crossref","unstructured":"Wang, X., et al., Non-local Neural Networks, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2018. p. 7794\u20137803.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"1343_CR50","unstructured":"Xiong, Y., et al., Cuhk & ethz & siat submission to activitynet challenge 2016. arXiv preprint arXiv:1608.00797, 2016."},{"key":"1343_CR51","first-page":"5794","volume":"2017","author":"H Xu","year":"2017","unstructured":"Xu H, Das A, Saenko K (2017) R-C3D: Region Convolutional 3D Network for Temporal Activity Detection. IEEE International Conference on Computer Vision (ICCV) 2017:5794\u20135803","journal-title":"IEEE International Conference on Computer Vision (ICCV)"},{"key":"1343_CR52","doi-asserted-by":"crossref","unstructured":"Xu, M., et al., G-tad: Sub-graph localization for temporal action detection, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 10156\u201310165.","DOI":"10.1109\/CVPR42600.2020.01017"},{"key":"1343_CR53","unstructured":"Yang, B., et al., Condconv: Conditionally parameterized convolutions for efficient inference, in Advances in Neural Information Processing Systems. 2019."},{"key":"1343_CR54","doi-asserted-by":"crossref","unstructured":"Yang, Y., et al., Exploiting semantic-level affinities with a mask-guided network for temporal action proposal in videos. Applied Intelligence, 2022: p. 1\u201321.","DOI":"10.1007\/s10489-022-04261-1"},{"key":"1343_CR55","doi-asserted-by":"crossref","unstructured":"Zeng, R., et al., Graph Convolutional Networks for Temporal Action Localization, in IEEE\/CVF International Conference on Computer Vision. 2019.","DOI":"10.1109\/ICCV.2019.00719"},{"key":"1343_CR56","doi-asserted-by":"crossref","unstructured":"Zhang, H., et al., MTSCANet: Multi temporal resolution temporal semantic context aggregation network. IET Computer Vision, 2023.","DOI":"10.1049\/cvi2.12163"},{"key":"1343_CR57","first-page":"539","volume-title":"European Conference on Computer Vision","author":"P Zhao","year":"2020","unstructured":"Zhao P et al (2020) Bottom-up temporal action localization with mutual regularization. European Conference on Computer Vision. Springer, pp 539\u2013555"},{"issue":"1","key":"1343_CR58","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1007\/s11263-019-01211-2","volume":"128","author":"Y Zhao","year":"2020","unstructured":"Zhao Y et al (2020) Temporal Action Detection with Structured Segment Networks. Int J Comput Vision 128(1):74\u201396","journal-title":"Int J Comput Vision"},{"key":"1343_CR59","doi-asserted-by":"publisher","first-page":"4746","DOI":"10.1109\/TIP.2022.3182866","volume":"31","author":"Y Zhao","year":"2022","unstructured":"Zhao Y et al (2022) A temporal-aware relation and attention network for temporal action localization. IEEE Trans Image Process 31:4746\u20134760","journal-title":"IEEE Trans Image Process"},{"key":"1343_CR60","doi-asserted-by":"crossref","unstructured":"Zhou, J., et al., Decoupled dynamic filter networks, in IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 6647\u20136656.","DOI":"10.1109\/CVPR46437.2021.00658"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01343-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01343-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01343-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T18:18:23Z","timestamp":1715883503000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01343-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,22]]},"references-count":60,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["1343"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01343-0","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,22]]},"assertion":[{"value":"31 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 January 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}