{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T16:04:09Z","timestamp":1773590649413,"version":"3.50.1"},"reference-count":61,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2021,8,18]],"date-time":"2021-08-18T00:00:00Z","timestamp":1629244800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>To apply powerful deep-learning-based algorithms for object detection and classification in infrared videos, it is necessary to have more training data in order to build high-performance models. However, in many surveillance applications, one can have a lot more optical videos than infrared videos. This lack of IR video datasets can be mitigated if optical-to-infrared video conversion is possible. In this paper, we present a new approach for converting optical videos to infrared videos using deep learning. The basic idea is to focus on target areas using attention generative adversarial network (attention GAN), which will preserve the fidelity of target areas. The approach does not require paired images. The performance of the proposed attention GAN has been demonstrated using objective and subjective evaluations. Most importantly, the impact of attention GAN has been demonstrated in improved target detection and classification performance using real-infrared videos.<\/jats:p>","DOI":"10.3390\/rs13163257","type":"journal-article","created":{"date-parts":[[2021,8,18]],"date-time":"2021-08-18T22:51:00Z","timestamp":1629327060000},"page":"3257","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Converting Optical Videos to Infrared Videos Using Attention GAN and Its Impact on Target Detection and Classification Performance"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2466-1212","authenticated-orcid":false,"given":"Mohammad Shahab","family":"Uddin","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA 23625, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Reshad","family":"Hoque","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA 23625, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9320-0858","authenticated-orcid":false,"given":"Kazi Aminul","family":"Islam","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA 23625, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4341-0769","authenticated-orcid":false,"given":"Chiman","family":"Kwan","sequence":"additional","affiliation":[{"name":"Applied Research LLC, Rockville, MD 20850, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Gribben","sequence":"additional","affiliation":[{"name":"Applied Research LLC, Rockville, MD 20850, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0091-6986","authenticated-orcid":false,"given":"Jiang","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA 23625, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kwan, C., Chou, B., and Kwan, L.M. (2018, January 25\u201328). A Comparative Study of Conventional and Deep Learning Target Tracking Algorithms for Low Quality Videos. Proceedings of the 15th International Symposium on Neural Networks, Minsk, Belarus.","DOI":"10.1007\/978-3-319-92537-0_60"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Demir, H.S., and Cetin, A.E. (2016, January 25\u201328). Co-difference based object tracking algorithm for infrared videos. Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7532394"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1629","DOI":"10.1007\/s11760-019-01506-4","article-title":"Target tracking and classification directly using compressive sensing camera for SWIR videos","volume":"13","author":"Kwan","year":"2019","journal-title":"J. Signal Image Video Process."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kwan, C., Chou, B., Yang, J., Rangamani, A., Tran, T., Zhang, J., and Etienne-Cummings, R. (2019). Deep Learning based Target Tracking and Classification for Low Quality Videos Using Coded Aperture Camera. Sensors, 19.","DOI":"10.3390\/s19173702"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lohit, S., Kulkarni, K., and Turaga, P.K. (2016, January 25\u201328). Direct inference on compressive measurements using convolutional neural networks. Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7532691"},{"key":"ref_6","unstructured":"Adler, A., Elad, M., and Zibulevsky, M. (2016). Compressed Learning: A Deep Neural Network Approach. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Xu, Y., and Kelly, K.F. (2019). Compressed domain image classification using a Dynamic-rate neural network. arXiv.","DOI":"10.1109\/ACCESS.2020.3041807"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, Z.W., Vineet, V., Pittaluga, F., Sinha, S.N., Cossairt, O., and Kang, S.B. (2019, January 16\u201320). Privacy-Preserving Action Recognition Using Coded Aperture Videos. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00007"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Vargas, H., Fonseca, Y., and Arguello, H. (2018, January 3\u20137). Object Detection on Compressive Measurements using Correlation Filters and Sparse Representation. Proceedings of the European Signal Processing Conference (EUSIPCO), Rome, Italy.","DOI":"10.23919\/EUSIPCO.2018.8553312"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"De\u011ferli, A., Aslan, S., Yamac, M., Sankur, B., and Gabbouj, M. (2018, January 26\u201328). Compressively Sensed Image Recognition. Proceedings of the European Workshop on Visual Information Processing (EUVIP), Tampere, Finland.","DOI":"10.1109\/EUVIP.2018.8611657"},{"key":"ref_11","first-page":"28","article-title":"Online reconstruction-free single-pixel image classification","volume":"86","author":"Traver","year":"2018","journal-title":"Image Vis. Comput."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Li, C., and Wang, W. (2018). Detection and Tracking of Moving Targets for Thermal Infrared Video Sequences. Sensors, 18.","DOI":"10.3390\/s18113944"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1016\/j.infrared.2013.08.014","article-title":"Background subtraction based level sets for human segmentation in thermal infrared surveillance systems","volume":"61","author":"Tan","year":"2013","journal-title":"Infrared Phys. Technol."},{"key":"ref_14","unstructured":"Berg, A., Ahlberg, J., and Felsberg, M. (July, January 26). Channel Coded Distribution Field Tracking for Thermal Infrared Imagery. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kwan, C., Gribben, D., Chou, B., Budavari, B., Larkin, J., Rangamani, A., Tran, T., Zhang, J., and Etienne-Cummings, R. (2020). Real-Time and Deep Learning based Vehicle Detection and Classification using Pixel-Wise Code Exposure Measurements. Electronics, 18.","DOI":"10.3390\/electronics9061014"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., and Efros, A.A. (2017, January 22\u201329). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yi, Z., Zhang, H., Tan, P., and Gong, M. (2017, January 22\u201329). Dualgan: Unsupervised dual learning for image-to-image translation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.310"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Park, T., Efros, A.A., Zhang, R., and Zhu, J.Y. (2020, January 23\u201328). Contrastive learning for unpaired image-to-image translation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58545-7_19"},{"key":"ref_19","unstructured":"(2020, January 01). ATR Dataset. Available online: https:\/\/www.dsiac.org\/resources\/available-databases\/atr-algorithm-development-image-database\/."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., Zhou, T., and Efros, A.A. (2017, January 21\u201326). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_21","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","unstructured":"Brock, A., Donahue, J., and Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv."},{"key":"ref_23","unstructured":"Kim, T., Cha, M., Kim, H., Lee, J.K., and Kim, J. (2017, January 6\u201311). Learning to discover cross-domain relations with generative adversarial networks. Proceedings of the International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Kastaniotis, D., Ntinou, I., Tsourounis, D., Economou, G., and Fotopoulos, S. (2018, January 10\u201312). Attention-aware generative adversarial networks (ATA-GANs). Proceedings of the 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Zagorochoria, Greece.","DOI":"10.1109\/IVMSPW.2018.8448850"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tang, H., Xu, D., Sebe, N., and Yan, Y. (2019, January 14\u201319). Attention-guided generative adversarial networks for unsupervised image-to-image translation. Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary.","DOI":"10.1109\/IJCNN.2019.8851881"},{"key":"ref_26","unstructured":"Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. (2019, January 9\u201315). Self-attention generative adversarial networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"112459","DOI":"10.1109\/ACCESS.2019.2933671","article-title":"NIR to RGB domain translation using asymmetric cycle generative adversarial networks","volume":"7","author":"Sun","year":"2019","journal-title":"IEEE Access"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Perera, P., Abavisani, M., and Patel, V.M. (2018, January 20\u201324). In2i: Unsupervised multi-image-to-image translation using generative adversarial networks. Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China.","DOI":"10.1109\/ICPR.2018.8545464"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Mehri, A., and Sappa, A.D. (2019, January 16\u201317). Colorizing near infrared images through a cyclic adversarial approach of unpaired samples. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00128"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Su\u00e1rez, P.L., Sappa, A.D., and Vintimilla, B.X. (2017, January 16\u201317). Infrared image colorization based on a triplet dcgan architecture. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2017.32"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, S., John, V., Blasch, E., Liu, Z., and Huang, Y. (2018, January 18\u201322). IR2VI: Enhanced night environmental perception by unsupervised thermal image translation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00160"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Berg, A., Ahlberg, J., and Felsberg, M. (2018, January 18\u201322). Generating visible spectrum images from thermal infrared. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00159"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1837","DOI":"10.1109\/TIP.2018.2879249","article-title":"Synthetic data generation for end-to-end thermal infrared tracking","volume":"28","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kniaz, V.V., Knyaz, V.A., Hladuvka, J., Kropatsch, W.G., and Mizginov, V. (2018, January 8\u201314). Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11024-6_46"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"149","DOI":"10.5194\/isprs-archives-XLII-2-W12-149-2019","article-title":"Synthetic thermal background and object texture generation using geometric information and gan","volume":"XLII-2\/W12","author":"Mizginov","year":"2019","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kniaz, V.V., and Mizginov, V.A. (2018). Thermal texture generation and 3d model reconstruction using sfm and gan. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci., 42.","DOI":"10.5194\/isprs-archives-XLII-2-519-2018"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"279","DOI":"10.5194\/isprs-annals-V-3-2020-279-2020","article-title":"Generating artificial near infrared spectral band from rgb image using conditional generative adversarial network","volume":"3","author":"Yuan","year":"2020","journal-title":"ISPRS Ann. Photogramm. Remote. Sens. Spat. Inf. Sci."},{"key":"ref_38","unstructured":"Uddin, M.S., and Li, J. (2020). Generative Adversarial Networks for Visible to Infrared Video Conversion. Recent Advances in Image Restoration with Applications to Real World Problems, IntechOpen."},{"key":"ref_39","first-page":"1099502","article-title":"Improved visible to IR image transformation using synthetic data augmentation with cycle-consistent adversarial networks","volume":"10995","author":"Yun","year":"2019","journal-title":"Pattern Recognition and Tracking XXX"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Abbott, R., Robertson, N.M., del Rincon, J.M., and Connor, B. (2020, January 14\u201319). Unsupervised object detection via LWIR\/RGB translation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00053"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., and Shi, W. (2017, January 21\u201326). Real-time video super-resolution with spatio-temporal networks and motion compensation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.304"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Jo, Y., Oh, S.W., Kang, J., and Kim, S.J. (2018, January 18\u201322). Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00340"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1109\/TCI.2016.2532323","article-title":"Video super-resolution with convolutional neural networks","volume":"2","author":"Kappeler","year":"2016","journal-title":"IEEE Trans. Comput. Imaging"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Drulea, M., and Nedevschi, S. (2011, January 5\u20137). Total variation regularization of local-global optical flow. Proceedings of the 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, DC, USA.","DOI":"10.1109\/ITSC.2011.6082986"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, D., Wang, Z., Fan, Y., Liu, X., Wang, Z., Chang, S., and Huang, T. (2017, January 22\u201329). Robust video super-resolution with learned temporal dynamics. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.274"},{"key":"ref_46","unstructured":"Yu, H., Wang, J., Huang, Z., Yang, Y., and Xu, W. (July, January 26). Video paragraph captioning using hierarchical recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., and Saenko, K. (2014). Translating videos to natural language using deep recurrent neural networks. arXiv.","DOI":"10.3115\/v1\/N15-1173"},{"key":"ref_48","unstructured":"Huang, Y., Wang, W., and Wang, L. (2015, January 7\u201312). Bidirectional recurrent convolutional networks for multi-frame super-resolution. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_49","unstructured":"Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., and Woo, W.C. (2015, January 7\u201312). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Sajjadi, M.S., Vemulapalli, R., and Brown, M. (2018, January 18\u201322). Frame-recurrent video super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00693"},{"key":"ref_51","unstructured":"Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv."},{"key":"ref_52","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_53","unstructured":"(2020, December 01). MOT Challenge. Available online: Motchallenge.net\/."},{"key":"ref_54","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_55","unstructured":"Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. (2016, January 5\u201310). Improved techniques for training gans. Proceedings of the ADVANCES in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_56","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (July, January 26). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_57","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017, January 4\u20139). Gans trained by a two time-scale update rule converge to a local Nash equilibrium. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_58","unstructured":"Bi\u0144kowski, M., Sutherland, D.J., Arbel, M., and Gretton, A. (2016). Demystifying mmd gans. arXiv."},{"key":"ref_59","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_60","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Haris, M., Shakhnarovich, G., and Ukita, N. (2019, January 15\u201320). Recurrent back-projection network for video super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00402"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/16\/3257\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:46:21Z","timestamp":1760165181000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/16\/3257"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,18]]},"references-count":61,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["rs13163257"],"URL":"https:\/\/doi.org\/10.3390\/rs13163257","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,18]]}}}