{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T16:17:07Z","timestamp":1778084227263,"version":"3.51.4"},"reference-count":59,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,27]],"date-time":"2022-03-27T00:00:00Z","timestamp":1648339200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>SWIR imaging bears considerable advantages over visible-light (color) and thermal images in certain challenging propagation conditions. Thus, the SWIR imaging channel is frequently used in multi-spectral imaging systems (MSIS) for long-range surveillance in combination with color and thermal imaging to improve the probability of correct operation in various day, night and climate conditions. Integration of deep-learning (DL)-based real-time object detection in MSIS enables an increase in efficient utilization for complex long-range surveillance solutions such as border or critical assets control. Unfortunately, a lack of datasets for DL-based object detection models training for the SWIR channel limits their performance. To overcome this, by using the MSIS setting we propose a new cross-spectral automatic data annotation methodology for SWIR channel training dataset creation, in which the visible-light channel provides a source for detecting object types and bounding boxes which are then transformed to the SWIR channel. A mathematical image transformation that overcomes differences between the SWIR and color channel and their image distortion effects for various magnifications are explained in detail. With the proposed cross-spectral methodology, the goal of the paper is to improve object detection in SWIR images captured in challenging outdoor scenes. Experimental tests for two object types (cars and persons) using a state-of-the-art YOLOX model demonstrate that retraining with the proposed automatic cross-spectrally created SWIR image dataset significantly improves average detection precision. We achieved excellent improvements in detection performance in various variants of the YOLOX model (nano, tiny and x).<\/jats:p>","DOI":"10.3390\/s22072562","type":"journal-article","created":{"date-parts":[[2022,3,27]],"date-time":"2022-03-27T21:31:25Z","timestamp":1648416685000},"page":"2562","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Deep Learning Based SWIR Object Detection in Long-Range Surveillance Systems: An Automated Cross-Spectral Approach"],"prefix":"10.3390","volume":"22","author":[{"given":"Milo\u0161 S.","family":"Pavlovi\u0107","sequence":"first","affiliation":[{"name":"School of Electrical Engineering, University of Belgrade, Bul. Kralja Aleksandara 73, 11120 Belgrade, Serbia"},{"name":"Vlatacom Institute of High Technologies, Milutina Milankovica 5, 11070 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3049-8105","authenticated-orcid":false,"given":"Petar D.","family":"Milanovi\u0107","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, University of Belgrade, Bul. Kralja Aleksandara 73, 11120 Belgrade, Serbia"},{"name":"Vlatacom Institute of High Technologies, Milutina Milankovica 5, 11070 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9064-7059","authenticated-orcid":false,"given":"Milo\u0161 S.","family":"Stankovi\u0107","sequence":"additional","affiliation":[{"name":"Vlatacom Institute of High Technologies, Milutina Milankovica 5, 11070 Belgrade, Serbia"},{"name":"Faculty of Technical Sciences, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2556-8212","authenticated-orcid":false,"given":"Dragana B.","family":"Peri\u0107","sequence":"additional","affiliation":[{"name":"Vlatacom Institute of High Technologies, Milutina Milankovica 5, 11070 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5006-7786","authenticated-orcid":false,"given":"Ilija V.","family":"Popadi\u0107","sequence":"additional","affiliation":[{"name":"Vlatacom Institute of High Technologies, Milutina Milankovica 5, 11070 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7945-5571","authenticated-orcid":false,"given":"Miroslav V.","family":"Peri\u0107","sequence":"additional","affiliation":[{"name":"Vlatacom Institute of High Technologies, Milutina Milankovica 5, 11070 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kolekar, M.H. (2019). Intelligent Video Surveillance Systems: An Algorithmic Approach, CRC Press\/Taylor & Francis Group.","DOI":"10.1201\/9781315153865"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Peri\u0107, D., Livada, B., Peri\u0107, M., and Vuji\u0107, S. (2019). Thermal Imager Range: Predictions, Expectations, and Reality. Sensors, 19.","DOI":"10.3390\/s19153313"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s00138-013-0570-5","article-title":"Thermal cameras and applications: A survey","volume":"25","author":"Gade","year":"2013","journal-title":"Mach. Vis. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"337","DOI":"10.2298\/SJEE2003337S","article-title":"Big Data and development of Smart City: System Architecture and Practical Public Safety Example","volume":"17","year":"2020","journal-title":"Serb. J. Electr. Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"125459","DOI":"10.1109\/ACCESS.2020.3007481","article-title":"Thermal Object Detection in Difficult Weather Conditions Using YOLO","volume":"8","author":"Kristo","year":"2020","journal-title":"IEEE Access"},{"key":"ref_6","first-page":"69390I","article-title":"Overview of SWIR Detectors, Cameras, and Applications","volume":"Volume 6939","author":"Hansen","year":"2008","journal-title":"Thermosense Xxx"},{"key":"ref_7","unstructured":"Driggers, R.G., Hodgkin, V., and Vollmerhausen, R. (May, January 30). What Good Is SWIR? Passive Day Comparison of VIS, NIR, and SWIR. Proceedings of the Infrared Imaging Systems: Design, Analysis, Modeling, and Testing XXIV, Baltimore, MD, USA."},{"key":"ref_8","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_9","unstructured":"(2022, February 02). Available online: https:\/\/pytorch.org\/hub\/nvidia_deeplearningexamples_ssd\/."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_11","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_12","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_13","unstructured":"(2022, February 02). Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_14","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding Yolo Series in 2021. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Iwata, S., Kawanishi, Y., Deguchi, D., Ide, I., Murase, H., and Aizawa, T. (2021, January 10\u201315). LFIR2Pose: Pose Estimation from an Extremely Low-resolution FIR image Sequence. Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy.","DOI":"10.1109\/ICPR48806.2021.9412484"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Le, H., Smailis, C., Shi, L., and Kakadiaris, I. (2020, January 1\u20135). EDGE20: A Cross Spectral Evaluation Dataset for Multiple Surveillance Problems. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093573"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kim, M., Joung, S., Park, K., Kim, S., and Sohn, K. (2019, January 22\u201325). Unpaired Cross-Spectral Pedestrian Detection Via Adversarial Feature Learning. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803098"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1364\/JOSA.47.000491","article-title":"Transmission by Haze and Fog in the Spectral Region 035 to 10 Microns","volume":"47","author":"Arnulf","year":"1957","journal-title":"J. Opt. Soc. Am."},{"key":"ref_19","unstructured":"(2022, February 02). Available online: https:\/\/www.vlatacominstitute.com\/_files\/ugd\/510d2b_ab410776328144979064c9cfa9bda036.pdf."},{"key":"ref_20","unstructured":"Peri\u0107, D., and Livada, B. (2017, January 5\u20138). Analysis of SWIR Imagers Application in Electro-Optical Systems. Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Kladovo, Serbia."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ni, Y., Bouvier, C., Arion, B., and Noguier, V. (2016, January 17\u201321). Wide Dynamic Logarithmic InGaAs Sensor Suitable for Eye-Safe Active Imaging. Proceedings of the SPIE Commercial + Scientific Sensing and Imaging, Baltimore, MD, USA.","DOI":"10.1117\/12.2224079"},{"key":"ref_22","unstructured":"Rankin, A.L., and Matthies, L.H. (2008). Daytime Mud Detection for Unmanned Ground Vehicle Autonomous Navigation, Jet Propulsion Laboratory, California Institute of Technology."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1002\/rob.20341","article-title":"Passive sensor evaluation for unmanned ground vehicle mud detection","volume":"27","author":"Rankin","year":"2010","journal-title":"J. Field Robot."},{"key":"ref_24","first-page":"87120J","article-title":"Investigating Gait Recognition in the Short-Wave Infrared (SWIR) Spectrum: Dataset and Challenges","volume":"Volume 8712","author":"DeCann","year":"2013","journal-title":"Biometric and Surveillance Technology for Human and Activity Identification X"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lemoff, B.E., Martin, R.B., Sluch, M., Kafka, K.M., McCormick, W., and Ice, R. (2013). Long-range Night\/Day Human Identification Using Active-SWIR Imaging. Infrared Technology and Applications XXXIX, International Society for Optics and Photonics.","DOI":"10.1117\/12.2016335"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bertozzi, M., Fedriga, R.I., Miron, A., and Reverchon, J.L. (2013, January 11\u201313). Pedestrian Detection in Poor Visibility Conditions: Would SWIR Help?. Proceedings of the International Conference on Image Analysis and Processing, Naples, Italy.","DOI":"10.1007\/978-3-642-41184-7_24"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Miron, A., Bensrhair, A., Fedriga, R.I., and Broggi, A. (2013, January 6\u20139). SWIR Images Evaluation for Pedestrian Detection in Clear Visibility Conditions. Proceedings of the 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), The Hague, The Netherlands.","DOI":"10.1109\/ITSC.2013.6728257"},{"key":"ref_28","first-page":"90703I","article-title":"Automated, Long-Range, Night\/Day, Active-SWIR Face Recognition System","volume":"Volume 9070","author":"Lemoff","year":"2014","journal-title":"Infrared Technology and Applications XL"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"8570","DOI":"10.3390\/s150408570","article-title":"Pedestrian Detection in Far-Infrared Daytime Images Using a Hierarchical Codebook of SURF","volume":"15","author":"Besbes","year":"2015","journal-title":"Sensors"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"99760K","DOI":"10.1117\/12.2238811","article-title":"Identifying Vehicles with VNIR-SWIR Hyperspectral Imagery: Sources of Distinguishability and Confusion","volume":"Volume 9976","author":"Sundberg","year":"2016","journal-title":"Imaging Spectrometry XXI"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Kwan, C., Chou, B., Echavarren, A., Budavari, B., Li, J., and Tran, T. (2018, January 8\u201310). Compressive Vehicle Tracking Using Deep Learning. Proceedings of the 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA.","DOI":"10.1109\/UEMCON.2018.8796778"},{"key":"ref_32","first-page":"1099506","article-title":"Compressive Object Tracking and Classification Using Deep Learning for Infrared Videos","volume":"10995","author":"Kwan","year":"2019","journal-title":"Pattern Recognition and Tracking XXX"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1629","DOI":"10.1007\/s11760-019-01506-4","article-title":"Target tracking and classification using compressive sensing camera for SWIR videos","volume":"13","author":"Kwan","year":"2019","journal-title":"Signal Image Video Process."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kwan, C. (2019, January 26\u201328). Object Tracking and Classification in Videos Using Compressive Measurements. Proceedings of the 3rd International Conference on Vision, Image and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1145\/3387168.3387188"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kandylakis, Z., Vasili, K., and Karantzalos, K. (2019). Fusing Multimodal Video Data for Detecting Moving Objects\/Targets in Challenging Indoor and Outdoor Scenes. Remote Sens., 11.","DOI":"10.3390\/rs11040446"},{"key":"ref_36","first-page":"2969239","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"637","DOI":"10.5194\/isprs-archives-XLIII-B2-2020-637-2020","article-title":"Semantic scene understanding for the autonomous platform","volume":"XLIII-B2-2","author":"Vishnyakov","year":"2020","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Latinovi\u0107, N., Popadi\u0107, I., Tomi\u0107, B., Simi\u0107, A., Milanovi\u0107, P., Nijem\u010devi\u0107, S., Peri\u0107, M., and Veinovi\u0107, M. (2022). Signal Processing Platform for Long-Range Multi-Spectral Electro-Optical Systems. Sensors, 22.","DOI":"10.3390\/s22031294"},{"key":"ref_39","unstructured":"Livada, B., Peric, D., and Peric, M. (2017, January 5\u20138). Challenges of Laser Range Finder Integration in Electro-Optical Surveillance System. Proceedings of the 4th International Conference on Electrical, Electronic, and Computing Engineering (IcETRAN 2017), Kladovo, Serbia."},{"key":"ref_40","unstructured":"Mambo, S. (2018). Optimisation and Performance Evaluation in Image Registration Technique. [Ph.D. Thesis, Tshwane University of Technology]."},{"key":"ref_41","unstructured":"(2022, February 02). Available online: https:\/\/github.com\/AlexeyAB\/Yolo_mark."},{"key":"ref_42","unstructured":"(2022, February 02). Available online: https:\/\/github.com\/cocodataset\/cocoapi\/blob\/master\/PythonAPI\/pycocotools\/cocoeval.py."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Zitnick, C.L., and Doll\u00e1r, P. (2015). Microsoft COCO: Common Objects in Context. arXiv, Available online: https:\/\/cocodataset.org\/.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., and Yeh, I.H. (2020, January 14\u201319). CSPNet: A New Backbone that Can Enhance Learning Capability of CNN. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR Workshop), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/j.ins.2020.02.067","article-title":"DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection","volume":"522","author":"Huang","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Dewi, C., Chen, R.-C., Yu, H., and Jiang, X. (2021). Robust detection method for improving small traffic sign recognition based on spatial pyramid pooling. J. Ambient Intell. Humaniz. Comput., 1\u201318.","DOI":"10.1007\/s12652-021-03584-0"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path Aggregation Network for Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Yao, J., Qi, J., Zhang, J., Shao, H., Yang, J., and Li, X. (2021). A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5. Electronics, 10.","DOI":"10.3390\/electronics10141711"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Zhao, J., Zhang, X., Yan, J., Qiu, X., Yao, X., Tian, Y., Zhu, Y., and Cao, W. (2021). A Wheat Spike Detection Method in UAV Images Based on Improved YOLOv5. Remote Sens., 13.","DOI":"10.3390\/rs13163095"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting Objects as Paired Keypoints. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019, January 27\u201328). Fcos: Fully Convolutional One-Stage Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_58","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Ge, Z., Liu, S., Li, Z., Yoshie, O., and Sun, J. (2021, January 20\u201325). OTA: Optimal Transport Assignment for Object Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00037"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2562\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:44:22Z","timestamp":1760136262000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2562"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,27]]},"references-count":59,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["s22072562"],"URL":"https:\/\/doi.org\/10.3390\/s22072562","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,27]]}}}