{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T14:23:49Z","timestamp":1768141429983,"version":"3.49.0"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T00:00:00Z","timestamp":1650931200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T00:00:00Z","timestamp":1650931200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["PID2020-112623GB-I00"],"award-info":[{"award-number":["PID2020-112623GB-I00"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["RTI2018-097088-B-C32"],"award-info":[{"award-number":["RTI2018-097088-B-C32"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008425","name":"Conseller\u00eda de Cultura, Educaci\u00f3n e Ordenaci\u00f3n Universitaria, Xunta de Galicia","doi-asserted-by":"publisher","award":["ED431C 2018\/29"],"award-info":[{"award-number":["ED431C 2018\/29"]}],"id":[{"id":"10.13039\/501100008425","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008425","name":"Conseller\u00eda de Cultura, Educaci\u00f3n e Ordenaci\u00f3n Universitaria, Xunta de Galicia","doi-asserted-by":"publisher","award":["ED431C 2017\/69"],"award-info":[{"award-number":["ED431C 2017\/69"]}],"id":[{"id":"10.13039\/501100008425","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008425","name":"Conseller\u00eda de Cultura, Educaci\u00f3n e Ordenaci\u00f3n Universitaria, Xunta de Galicia","doi-asserted-by":"publisher","award":["ED431G\/08"],"award-info":[{"award-number":["ED431G\/08"]}],"id":[{"id":"10.13039\/501100008425","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100015068","name":"Universidade de Santiago de Compostela","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100015068","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper addresses the problem of exploiting spatiotemporal information to improve small object detection precision in video. We propose a two-stage object detector called FANet based on short-term spatiotemporal feature aggregation and long-term object linking to refine object detections. First, we generate a set of short tubelet proposals. Then, we aggregate RoI pooled deep features throughout the tubelet using a new temporal pooling operator that summarizes the information with a fixed output size independent of the tubelet length. In addition, we define a double head implementation that we feed with spatiotemporal information for spatiotemporal classification and with spatial information for object localization and spatial classification. Finally, a long-term linking method builds long tubes with the previously calculated short tubelets to overcome detection errors. The association strategy addresses the generally low overlap between instances of small objects in consecutive frames by reducing the influence of the overlap in the final linking score. We evaluated our model in three different datasets with small objects, outperforming previous state-of-the-art spatiotemporal object detectors and our spatial baseline.<\/jats:p>","DOI":"10.1007\/s10489-022-03529-w","type":"journal-article","created":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T14:03:39Z","timestamp":1650981819000},"page":"1205-1217","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Spatiotemporal tubelet feature aggregation and object linking for small object detection in videos"],"prefix":"10.1007","volume":"53","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5548-4837","authenticated-orcid":false,"given":"Daniel","family":"Cores","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"V\u00edctor M.","family":"Brea","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manuel","family":"Mucientes","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,4,26]]},"reference":[{"key":"3529_CR1","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single Shot multibox detector. In: European conference on computer vision (ECCV)","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"3529_CR2","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2016.91"},{"key":"3529_CR3","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2014.81"},{"key":"3529_CR4","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast r-CNN. In: IEEE International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2015.169"},{"key":"3529_CR5","unstructured":"Ren S, He K, Girshick R, Sun J, Faster R-CNN (2015) Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS)"},{"key":"3529_CR6","doi-asserted-by":"publisher","first-page":"103615","DOI":"10.1016\/j.engappai.2020.103615","volume":"91","author":"B Bosquet","year":"2020","unstructured":"Bosquet B, Mucientes M, Brea VM (2020) STDnet: Exploiting high resolution feature maps for small object detection. Eng Appl Artif Intell 91:103615","journal-title":"Eng Appl Artif Intell"},{"key":"3529_CR7","doi-asserted-by":"crossref","unstructured":"Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking. In: European conference on computer vision (ECCV), pp 370\u2013386","DOI":"10.1007\/978-3-030-01249-6_23"},{"key":"3529_CR8","doi-asserted-by":"crossref","unstructured":"Zhu P, Wen L, Du D, Bian X, Ling H, Hu Q, Nie Q, Cheng H, Liu C, Liu X et al (2018) Visdrone-det2018: The vision meets drone object detection in image challenge results. In: European conference on computer vision (ECCV)","DOI":"10.1109\/ICCVW.2019.00031"},{"key":"3529_CR9","doi-asserted-by":"crossref","unstructured":"Pal SK, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell, 1\u201330","DOI":"10.1007\/s10489-021-02293-7"},{"key":"3529_CR10","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Doll\u00e1r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2017.106"},{"key":"3529_CR11","doi-asserted-by":"crossref","unstructured":"Chalavadi V, Jeripothula P, Datla R, Ch SB et al (2022) Msodanet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recognition, 108548","DOI":"10.1016\/j.patcog.2022.108548"},{"key":"3529_CR12","doi-asserted-by":"crossref","unstructured":"Cai Z, Vasconcelos N, Cascade R-CNN (2018) Delving into high quality object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2018.00644"},{"key":"3529_CR13","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Goyal P, Girshick R, He K, Doll\u00e1r P (2017) Focal loss for dense object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/ICCV.2017.324"},{"key":"3529_CR14","doi-asserted-by":"crossref","unstructured":"Zhu X, Wang Y, Dai J, Yuan L, Wei Y (2017) Flow-guided feature aggregation for video object detection. In: IEEE International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2017.52"},{"key":"3529_CR15","doi-asserted-by":"crossref","unstructured":"Xie J, Gao C, Wu J, Shi Z, Chen J (2021) Small low-contrast target detection: Data-driven spatiotemporal feature fusion and implementation, IEEE Transactions on Cybernetics","DOI":"10.1109\/TCYB.2021.3072311"},{"key":"3529_CR16","doi-asserted-by":"crossref","unstructured":"Kang K, Ouyang W, Li H, Wang X (2016) Object detection from video tubelets with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2016.95"},{"issue":"10","key":"3529_CR17","doi-asserted-by":"publisher","first-page":"2896","DOI":"10.1109\/TCSVT.2017.2736553","volume":"28","author":"K Kang","year":"2017","unstructured":"Kang K, Li H, Yan J, Zeng X, Yang B, Xiao T, Zhang C, Wang Z, Wang R, Wang X et al (2017) T-CNN: Tubelets with convolutional neural networks for object detection from videos. IEEE Trans Circuits Syst Video Technol 28(10):2896\u20132907","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"3529_CR18","doi-asserted-by":"crossref","unstructured":"Kang K, Li H, Xiao T, Ouyang W, Yan J, Liu X, Wang X (2017) Object detection in videos with tubelet proposal networks. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2017.101"},{"key":"3529_CR19","doi-asserted-by":"crossref","unstructured":"Gong T, Chen K, Wang X, Chu Q, Zhu F, Lin D, Yu N, Feng H (2021) Temporal ROI align for video object recognition. In: Conference on artificial intelligence (AAAI), vol 35, pp 1442\u20131450","DOI":"10.1609\/aaai.v35i2.16234"},{"key":"3529_CR20","doi-asserted-by":"crossref","unstructured":"Kalogeiton V, Weinzaepfel P, Ferrari V, Schmid C (2017) Action tubelet detector for spatio-temporal action localization. In: IEEE International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2017.472"},{"key":"3529_CR21","doi-asserted-by":"crossref","unstructured":"Tang P, Wang C, Wang X, Liu W, Zeng W, Wang J (2019) Object detection in videos by high quality object linking, IEEE Transactions on Pattern Analysis and Machine Intelligence","DOI":"10.1109\/TPAMI.2019.2910529"},{"key":"3529_CR22","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: IEEE International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2017.330"},{"key":"3529_CR23","doi-asserted-by":"crossref","unstructured":"Wu H, Chen Y, Wang N, Zhang Z (2019) Sequence level semantics aggregation for video object detection. In: IEEE International conference on computer vision (ICCV), pp 9217\u20139225","DOI":"10.1109\/ICCV.2019.00931"},{"key":"3529_CR24","doi-asserted-by":"crossref","unstructured":"Chen Y, Cao Y, Hu H, Wang L (2020) Memory enhanced global-local aggregation for video object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 10337\u201310346","DOI":"10.1109\/CVPR42600.2020.01035"},{"key":"3529_CR25","doi-asserted-by":"crossref","unstructured":"Deng J, Pan Y, Yao T, Zhou W, Li H, Mei T (2019) Relation distillation networks for video object detection. In: IEEE international conference on computer vision (ICCV), pp 7023\u20137032","DOI":"10.1109\/ICCV.2019.00712"},{"key":"3529_CR26","doi-asserted-by":"crossref","unstructured":"Bosquet B, Mucientes M, Brea VM (2021) STDnet-ST: Spatio-temporal convnet for small object detection, Pattern Recognition, 107929","DOI":"10.1016\/j.patcog.2021.107929"},{"key":"3529_CR27","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-CNN. In: IEEE International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2017.322"},{"key":"3529_CR28","doi-asserted-by":"crossref","unstructured":"Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, Fu Y (2020) Rethinking classification and localization for object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR42600.2020.01020"},{"key":"3529_CR29","doi-asserted-by":"crossref","unstructured":"Gkioxari G, Malik J (2015) Finding action tubes. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2015.7298676"},{"key":"3529_CR30","doi-asserted-by":"crossref","unstructured":"Saha S, Singh G, Sapienza M, Torr P, Cuzzolin F (2016) Deep learning for detecting multiple space-time action tubes in videos. In: British machine vision conference (BMVC)","DOI":"10.5244\/C.30.58"},{"key":"3529_CR31","doi-asserted-by":"crossref","unstructured":"Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 658\u2013666","DOI":"10.1109\/CVPR.2019.00075"},{"key":"3529_CR32","doi-asserted-by":"crossref","unstructured":"Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In: IEEE International conference on computer vision (ICCV)","DOI":"10.1109\/ICCV.2015.135"},{"key":"3529_CR33","unstructured":"Bosquet B, Mucientes M, Brea VM (2018) STDNet: A convnet for small target detection.. In: British machine vision conference (BMVC)"},{"key":"3529_CR34","doi-asserted-by":"crossref","unstructured":"Xie S, Girshick R, Doll\u00e1r P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2017.634"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03529-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-022-03529-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03529-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T04:59:50Z","timestamp":1672721990000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-022-03529-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,26]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1]]}},"alternative-id":["3529"],"URL":"https:\/\/doi.org\/10.1007\/s10489-022-03529-w","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,26]]},"assertion":[{"value":"19 March 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 April 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}