{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:17:41Z","timestamp":1773803861451,"version":"3.50.1"},"reference-count":44,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Fundamentals"],"published-print":{"date-parts":[[2025,4,1]]},"DOI":"10.1587\/transfun.2023eap1130","type":"journal-article","created":{"date-parts":[[2024,10,15]],"date-time":"2024-10-15T22:11:18Z","timestamp":1729030278000},"page":"582-596","source":"Crossref","is-referenced-by-count":2,"title":["ACSTNet: An Attention Cross Stage Transformers Network for Small Object Detection in Remote Sensing Images"],"prefix":"10.1587","volume":"E108.A","author":[{"given":"Yang","family":"LIU","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Jialong","family":"WEI","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Shujian","family":"ZHAO","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Wenhua","family":"XIE","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Niankuan","family":"CHEN","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Jie","family":"LI","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Xin","family":"CHEN","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Kaixuan","family":"YANG","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]},{"given":"Yongwei","family":"LI","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences"}]},{"given":"Zhen","family":"ZHAO","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Qingdao University of Science and Technology"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"publisher","unstructured":"[1] N. Delavarpour, C. Koparan, J. Nowatzki, S. Bajwa, and X. Sun, \u201cA technical study on UAV characteristics for precision agriculture applications and associated practical challenges,\u201d Remote Sens., vol.13, no.6, pp.1-16, 2021. 10.3390\/rs13061204","DOI":"10.3390\/rs13061204"},{"key":"2","doi-asserted-by":"publisher","unstructured":"[2] N. Kussul, M. Lavreniuk, S. Skakun, and A. Shelestov, \u201cDeep learning classification of land cover and crop types using remote sensing data,\u201d Remote Sens., vol.14, no.5, pp.778-782, 2017. 10.1109\/lgrs.2017.2681128","DOI":"10.1109\/LGRS.2017.2681128"},{"key":"3","doi-asserted-by":"publisher","unstructured":"[3] T. Lei, J. Wang, X. Li, W. Wang, C. Shao, and B. Liu, \u201cFlood disaster monitoring and emergency assessment based on multi-source remote sensing observations,\u201d Water, vol.14, no.14, p.2207, 2022. 10.3390\/w14142207","DOI":"10.3390\/w14142207"},{"key":"4","unstructured":"[4] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, \u201cYOLOX: Exceeding YOLO series in 2021,\u201d arXiv preprint arXiv:2107.08430, 2021. 10.48550\/arXiv.2107.08430"},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll\u00e1r, and C.L. Zitnick, \u201cMicrosoft COCO: Common objects in context,\u201d Eur. Conf. Comput. Vision, 2014. 10.1007\/978-3-319-10602-1_48","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"6","unstructured":"[6] L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, \u201cRethinking atrous convolution for semantic image segmentation,\u201d arXiv preprint arXiv:1706.05587, 2017. 10.48550\/arXiv.1706.05587"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] S. Albawi, T.A. Mohammed, and S. Al-Zawi, \u201cUnderstanding of a convolutional neural network,\u201d 2017 International Conference on Engineering and Technology (ICET), pp.1-6, 2017. 10.1109\/icengtechnol.2017.8308186","DOI":"10.1109\/ICEngTechnol.2017.8308186"},{"key":"8","doi-asserted-by":"publisher","unstructured":"[8] N. Sun, W. Li, J. Liu, G. Han, and C. Wu, \u201cFusing object semantics and deep appearance features for scene recognition,\u201d IEEE Trans. Circuits Syst.Video Technol., vol.29, no.6, pp.1715-1728, 2018. 10.1109\/tcsvt.2018.2848543","DOI":"10.1109\/TCSVT.2018.2848543"},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, \u201cSwin transformer: Hierarchical vision transformer using shifted windows,\u201d 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), pp.10\u2006012-10\u2006022, 2021. 10.1109\/iccv48922.2021.00986","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] C.-Y. Wang, H.-Y.M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, \u201cCSPNet: A new backbone that can enhance learning capability of CNN,\u201d 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.390-391, 2020. 10.1109\/cvprw50498.2020.00203","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, \u201cPath aggregation network for instance segmentation,\u201d 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.8759-8768, 2018. 10.1109\/cvpr.2018.00913","DOI":"10.1109\/CVPR.2018.00913"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] L. Zhang, L. Zhang, and B. Du, \u201cDeep learning for remote sensing data: A technical tutorial on the state of the art,\u201d IEEE Geosci. Remote Sens. Mag., vol.4, no.2, pp.22-40, 2016. 10.1109\/mgrs.2016.2540798","DOI":"10.1109\/MGRS.2016.2540798"},{"key":"13","doi-asserted-by":"publisher","unstructured":"[13] N. Dalal and B. Triggs, \u201cHistograms of oriented gradients for human detection,\u201d 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), pp.886-893, 2005. 10.1109\/cvpr.2005.177","DOI":"10.1109\/CVPR.2005.177"},{"key":"14","doi-asserted-by":"publisher","unstructured":"[14] R. Wang, M. Yao, D. Zhang, and H. Zou, \u201cA novel orthonormalization matrix based fast and stable DPM algorithm for principal and minor subspace tracking,\u201d IEEE Trans. Signal Process., vol.60, no.1, pp.466-472, 2011. 10.1109\/tsp.2011.2169406","DOI":"10.1109\/TSP.2011.2169406"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] R. Girshick, J. Donahue, T. Darrell, and J. Malik, \u201cRich feature hierarchies for accurate object detection and semantic segmentation,\u201d 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.580-587, 2014. 10.1109\/cvpr.2014.81","DOI":"10.1109\/CVPR.2014.81"},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] R. Girshick, \u201cFast R-cnn,\u201d Proc. 2015 IEEE International Conference on Computer Vision (ICCV), pp.1440-1448, 2015. 10.1109\/iccv.2015.169","DOI":"10.1109\/ICCV.2015.169"},{"key":"17","doi-asserted-by":"crossref","unstructured":"[17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \u201cYou only look once: Unified, real-time object detection,\u201d 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.779-788, 2016. 10.1109\/cvpr.2016.91","DOI":"10.1109\/CVPR.2016.91"},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] J. Redmon and A. Farhadi, \u201cYOLO9000: Better, faster, stronger,\u201d Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6517-6525, 2017. 10.1109\/cvpr.2017.690","DOI":"10.1109\/CVPR.2017.690"},{"key":"19","unstructured":"[19] J. Redmon and A. Farhadi, \u201cYOLOv3: An incremental improvement,\u201d arXiv preprint arXiv:1804.02767, 2018. 10.48550\/arXiv.1804.02767"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] T.-Y. Lin, P. Doll\u00e1r, R. Girshick, K. He, B. Hariharan, and S. Belongie, \u201cFeature pyramid networks for object detection,\u201d Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp.2117-2125, 2017. 10.1109\/cvpr.2017.106","DOI":"10.1109\/CVPR.2017.106"},{"key":"21","unstructured":"[21] A. Bochkovskiy, C.-Y. Wang, and H.-Y.M. Liao, \u201cYOLOv4: Optimal speed and accuracy of object detection,\u201d arXiv preprint, arXiv:2004.10934, 2020. 10.48550\/arXiv.2004.10934"},{"key":"22","unstructured":"[23] D. Misra, \u201cMish: A self regularized non-monotonic activation function,\u201d arXiv preprint arXiv:1908.08681, 2019. 10.48550\/arXiv.1908.08681"},{"key":"23","doi-asserted-by":"crossref","unstructured":"[24] Z. Su, J. Yu, H. Tan, X. Wan, and K. Qi, \u201cMSA-YOLO: A remote sensing object detection model based on multi-scale strip attention,\u201d Sensors, vol.23, no.15, p.6811, 2023. 10.3390\/s23156811","DOI":"10.3390\/s23156811"},{"key":"24","doi-asserted-by":"publisher","unstructured":"[25] H. Wang, Y. Jin, H. Ke, and X. Zhang, \u201cDDH-YOLOv5: Improved YOLOv5 based on double IoU-aware decoupled head for object detection,\u201d J. Real-Time Image. Proc., vol.19, no.6, pp.1023-1033, 2022. 10.1007\/s11554-022-01241-z","DOI":"10.1007\/s11554-022-01241-z"},{"key":"25","doi-asserted-by":"crossref","unstructured":"[26] Z. Ge, S. Liu, Z. Li, O. Yoshie, and J. Sun, \u201cOTA: Optimal transport assignment for object detection,\u201d 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.303-312, 2021. 10.1109\/cvpr46437.2021.00037","DOI":"10.1109\/CVPR46437.2021.00037"},{"key":"26","doi-asserted-by":"crossref","unstructured":"[27] S. Qiao, L.-C. Chen, and A. Yuille, \u201cDetectoRS: Detecting objects with recursive feature pyramid and switchable atrous convolution,\u201d Proc. IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.10213-10224, 2021. 10.1109\/cvpr46437.2021.01008","DOI":"10.1109\/CVPR46437.2021.01008"},{"key":"27","doi-asserted-by":"crossref","unstructured":"[28] M.W. Ashraf, W. Sultani, and M. Shah, \u201cDogfight: Detecting drones from drones videos,\u201d 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.7067-7076, 2021. 10.1109\/cvpr46437.2021.00699","DOI":"10.1109\/CVPR46437.2021.00699"},{"key":"28","doi-asserted-by":"crossref","unstructured":"[29] J. Han, J. Ding, N. Xue, and G.-S. Xia, \u201cReDet: A rotation-equivariant detector for aerial object detection,\u201d Proc. 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.2786-2795, 2021. 10.1109\/cvpr46437.2021.00281","DOI":"10.1109\/CVPR46437.2021.00281"},{"key":"29","doi-asserted-by":"crossref","unstructured":"[30] J. Hu, L. Shen, and G. Sun, \u201cSqueeze-and-excitation networks,\u201d 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.7132-7141, 2018. 10.1109\/cvpr.2018.00745","DOI":"10.1109\/CVPR.2018.00745"},{"key":"30","doi-asserted-by":"publisher","unstructured":"[31] K. He, X. Zhang, S. Ren, and J. Sun, \u201cSpatial pyramid pooling in deep convolutional networks for visual recognition,\u201d IEEE Trans. Pattern Anal. Mach. Intell., vol.37, no.9, pp.1904-1916, 2015. 10.1109\/TPAMI.2015.2389824","DOI":"10.1109\/TPAMI.2015.2389824"},{"key":"31","doi-asserted-by":"publisher","unstructured":"[32] Y. Ho and S. Wookey, \u201cThe real-world-weight cross-entropy loss function: Modeling the costs of mislabeling,\u201d IEEE Access, vol.8, pp.4806-4813, 2019. 10.1109\/access.2019.2962617","DOI":"10.1109\/ACCESS.2019.2962617"},{"key":"32","unstructured":"[33] J. He, S. Erfani, X. Ma, J. Bailey, Y. Chi, and X.S. Hua, \u201cAlpha-IoU: A family of power intersection over union losses for bounding box regression,\u201d arXiv preprint arXiv:2110.13675, 2021. 10.48550\/arXiv.2110.13675"},{"key":"33","doi-asserted-by":"publisher","unstructured":"Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, \u201cDistance-IoU loss: Faster and better learning for bounding box regression,\u201d Proc. AAAI Conference on Artificial Intelligence, vol.34, no.07, pp.12993-13000, 2020. 10.1609\/aaai.v34i07.6999","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"34","doi-asserted-by":"publisher","unstructured":"[34] K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, \u201cObject detection in optical remote sensing images: A survey and a new benchmark,\u201d ISPRS Journal of Photogrammetry and Remote Sensing, vol.159, pp.296-307, 2020. 10.1016\/j.isprsjprs.2019.11.023","DOI":"10.1016\/j.isprsjprs.2019.11.023"},{"key":"35","doi-asserted-by":"publisher","unstructured":"[35] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, \u201cAccurate object localization in remote sensing images based on convolutional neural networks,\u201d IEEE Trans. Geosci. Remote Sens., vol.55, no.5, pp.2486-2498, 2017. 10.1109\/tgrs.2016.2645610","DOI":"10.1109\/TGRS.2016.2645610"},{"key":"36","doi-asserted-by":"crossref","unstructured":"[36] Y. Yang, Y. Liao, L. Cheng, K. Zhang, H. Wang, and S. Chen, \u201cRemote sensing image aircraft target detection based on GIoU-YOLO v3,\u201d 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), pp.474-478, 2021. 10.1109\/icsp51882.2021.9408837","DOI":"10.1109\/ICSP51882.2021.9408837"},{"key":"37","doi-asserted-by":"crossref","unstructured":"[37] Z. Cai and N. Vasconcelos, \u201cCascade R-CNN: Delving into high quality object detection,\u201d 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.6154-6162, 2018. 10.1109\/cvpr.2018.00644","DOI":"10.1109\/CVPR.2018.00644"},{"key":"38","doi-asserted-by":"crossref","unstructured":"[38] B. Jiang, R. Luo, J. Mao, T. Xiao, and Y. Jiang, \u201cAcquisition of localization confidence for accurate object detection,\u201d Computer Vision - ECCV 2018, pp.816-832, 2018. 10.1007\/978-3-030-01264-9_48","DOI":"10.1007\/978-3-030-01264-9_48"},{"key":"39","doi-asserted-by":"publisher","unstructured":"[39] H. Wei, Y. Zhang, Z. Chang, H. Li, H. Wang, and X. Sun, \u201cOriented objects as pairs of middle lines,\u201d ISPRS Journal of Photogrammetry and Remote Sensing, vol.169, pp.268-279, 2020. 10.1016\/j.isprsjprs.2020.09.022","DOI":"10.1016\/j.isprsjprs.2020.09.022"},{"key":"40","doi-asserted-by":"publisher","unstructured":"[40] J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, \u201cArbitrary-oriented scene text detection via rotation proposals,\u201d IEEE Trans. Multimedia, vol.20, no.11, pp.3111-3122, 2018. 10.1109\/tmm.2018.2818020","DOI":"10.1109\/TMM.2018.2818020"},{"key":"41","doi-asserted-by":"crossref","unstructured":"[41] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll\u00e1r, \u201cFocal loss for dense object detection,\u201d 2017 IEEE International Conference on Computer Vision (ICCV), pp.2980-2988, 2017. 10.1109\/iccv.2017.324","DOI":"10.1109\/ICCV.2017.324"},{"key":"42","unstructured":"[42] Y. Yi, X. Yang, Q. Li, F. Da, J. Yan, J. Dai, and Y. Qiao, \u201cPoint2RBox: Combine knowledge from synthetic visual patterns for end-to-end oriented object detection with single point supervision,\u201d arXiv preprint, arXiv:2311.14758, 2023. 10.48550\/arXiv.2311.14758"},{"key":"43","doi-asserted-by":"publisher","unstructured":"[43] D. Liang, Q. Geng, Z. Wei, D.A. Vorontsov, E.L. Kim, M. Wei, and H. Zhou, \u201cAnchor retouching via model interaction for robust object detection in aerial images,\u201d IEEE Trans. Geosci. Remote Sens., vol.60, pp.1-13, 2022. 10.1109\/tgrs.2021.3136350","DOI":"10.1109\/TGRS.2021.3136350"},{"key":"44","doi-asserted-by":"crossref","unstructured":"[44] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, \u201cGrad-CAM: Visual explanations from deep networks via gradient-based localization,\u201d 2017 IEEE International Conference on Computer Vision (ICCV), pp.618-626, 2017. 10.1109\/iccv.2017.74","DOI":"10.1109\/ICCV.2017.74"}],"container-title":["IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transfun\/E108.A\/4\/E108.A_2023EAP1130\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,5]],"date-time":"2025-04-05T03:18:15Z","timestamp":1743823095000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transfun\/E108.A\/4\/E108.A_2023EAP1130\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,1]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025]]}},"URL":"https:\/\/doi.org\/10.1587\/transfun.2023eap1130","relation":{},"ISSN":["0916-8508","1745-1337"],"issn-type":[{"value":"0916-8508","type":"print"},{"value":"1745-1337","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,1]]},"article-number":"2023EAP1130"}}