{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T00:05:53Z","timestamp":1780445153587,"version":"3.54.1"},"reference-count":79,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T00:00:00Z","timestamp":1748908800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>This study aims to comprehensively review and empirically evaluate the application of multimodal large language models (MLLMs) and Large Vision Models (VLMs) in object detection for transportation systems. In the first fold, we provide a background about the potential benefits of MLLMs in transportation applications and conduct a comprehensive review of current MLLM technologies in previous studies. We highlight their effectiveness and limitations in object detection within various transportation scenarios. The second fold involves providing an overview of the taxonomy of end-to-end object detection in transportation applications and future directions. Building on this, we proposed empirical analysis for testing MLLMs on three real-world transportation problems that include object detection tasks, namely, road safety attribute extraction, safety-critical event detection, and visual reasoning of thermal images. Our findings provide a detailed assessment of MLLM performance, uncovering both strengths and areas for improvement. Finally, we discuss practical limitations and challenges of MLLMs in enhancing object detection in transportation, thereby offering a roadmap for future research and development in this critical area.<\/jats:p>","DOI":"10.3390\/computation13060133","type":"journal-article","created":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T03:57:18Z","timestamp":1748923038000},"page":"133","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6835-8338","authenticated-orcid":false,"given":"Huthaifa I.","family":"Ashqar","sequence":"first","affiliation":[{"name":"AI and Data Science Department, Arab American University, 13 Zababdeh, Jenin P.O. Box 240, Palestine"},{"name":"Artificial Intelligence Program, Fu Foundation School of Engineering and Applied Science, Columbia University, 500 W 120th Street, New York, NY 10027, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5493-2530","authenticated-orcid":false,"given":"Ahmed","family":"Jaber","sequence":"additional","affiliation":[{"name":"Department of Transport Technology and Economics, Budapest University of Technology and Economics, M\u0171egyetem rkp. 3., H-1111 Budapest, Hungary"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6388-0904","authenticated-orcid":false,"given":"Taqwa I.","family":"Alhadidi","sequence":"additional","affiliation":[{"name":"Civil Engineering Department, Al-Ahliyya Amman University, Al-Saro Al-Salt, Amman 19111, Jordan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2634-4576","authenticated-orcid":false,"given":"Mohammed","family":"Elhenawy","sequence":"additional","affiliation":[{"name":"CARRS-Q, Queensland University of Technology, 130 Victoria Park Rd, Kelvin Grove, Brisbane, QLD 4059, Australia"},{"name":"Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Al-Saro Al-Salt, Amman 19111, Jordan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"101350","DOI":"10.1016\/j.compenvurbsys.2019.101350","article-title":"Detecting and mapping traffic signs from Google Street View images using deep learning and GIS","volume":"77","author":"Campbell","year":"2019","journal-title":"Comput. Environ. Urban. Syst."},{"key":"ref_2","first-page":"569","article-title":"A Review on Traffic Monitoring System Techniques","volume":"Volume 742","author":"Ray","year":"2019","journal-title":"Soft Computing: Theories and Applications"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"10501","DOI":"10.1007\/s00521-024-09602-4","article-title":"Optimum sensors allocation for drones multi-target tracking under complex environment using improved prairie dog optimization","volume":"36","author":"Zitar","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1773","DOI":"10.1109\/TITS.2013.2266661","article-title":"Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis","volume":"14","author":"Sivaraman","year":"2013","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Huang, D., Yan, C., Li, Q., and Peng, X. (2024). From Large Language Models to Large Multimodal Models: A Literature Review. Appl. Sci., 14.","DOI":"10.3390\/app14125068"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1571","DOI":"10.3390\/vehicles6030074","article-title":"Using Multimodal Large Language Models (MLLMs) for Automated Detection of Traffic Safety-Critical Events","volume":"6","author":"Tami","year":"2024","journal-title":"Vehicles"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jaradat, S., Alhadidi, T.I., Ashqar, H.I., Hossain, A., and Elhenawy, M. (2024). Exploring Traffic Crash Narratives in Jordan Using Text Mining Analytics. arXiv.","DOI":"10.1109\/ICMI60790.2024.10586010"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1894","DOI":"10.3390\/make6030093","article-title":"Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges","volume":"6","author":"Elhenawy","year":"2024","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Belhaouari, S.B., and Kraidia, I. (2025). Efficient self-attention with smart pruning for sustainable large language models. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-92586-5"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Luo, S., Chen, W., Tian, W., Liu, R., Hou, L., Zhang, X., Shen, H., Wu, R., Geng, S., and Zhou, Y. (2024). Delving Into Multi-Modal Multi-Task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives. IEEE Trans. Intell. Veh., 1\u201325.","DOI":"10.1109\/TIV.2024.3406372"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, D., Zheng, H., Yue, W., and Wang, X. (2024, January 17\u201320). Advancing ITS Applications with LLMs: A Survey on Traffic Management, Transportation Safety, and Autonomous Driving. Proceedings of the Rough Sets: International Joint Conference, IJCRS 2024, Halifax, NS, Canada.","DOI":"10.1007\/978-3-031-65668-2_20"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3152","DOI":"10.1109\/TITS.2019.2929020","article-title":"Deep Learning for Intelligent Transportation Systems: A Survey of Emerging Trends","volume":"21","author":"Veres","year":"2020","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Naveed, Q.N., Alqahtani, H., Khan, R.U., Almakdi, S., Alshehri, M., and Rasheed, M.A.A. (2022). An Intelligent Traffic Surveillance System Using Integrated Wireless Sensor Network and Improved Phase Timing Optimization. Sensors, 22.","DOI":"10.3390\/s22093333"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yoon, S., and Cho, J. (2022). Deep Multimodal Detection in Reduced Visibility Using Thermal Depth Estimation for Autonomous Driving. Sensors, 22.","DOI":"10.3390\/s22145084"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1007\/s11263-023-01784-z","article-title":"Multi-Modal 3D Object Detection in Autonomous Driving: A Survey","volume":"131","author":"Wang","year":"2023","journal-title":"Int. J. Comput. Vis."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"5783","DOI":"10.1109\/JIOT.2019.2949633","article-title":"Context-Aware Object Detection for Vehicular Networks Based on Edge-Cloud Cooperation","volume":"7","author":"Guo","year":"2020","journal-title":"IEEE Internet Things J."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lee, W.-Y., Jovanov, L., and Philips, W. (2022, January 23\u201327). Cross-Modality Attention and Multimodal Fusion Transformer for Pedestrian Detection. Proceedings of the Computer Vision\u2014ECCV 2022 Workshops, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25072-9_41"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chen, Y., Ye, J., and Wan, X. (2023). TF-YOLO: A Transformer\u2013Fusion-Based YOLO Detector for Multimodal Pedestrian Detection in Autonomous Driving Scenes. World Electr. Veh. J., 14.","DOI":"10.3390\/wevj14120352"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"104074","DOI":"10.1016\/j.tra.2024.104074","article-title":"Applying masked language model for transport mode choice behavior prediction","volume":"184","author":"Yang","year":"2024","journal-title":"Transp. Res. Part A Policy Pract."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1007\/s11633-022-1410-8","article-title":"Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey","volume":"20","author":"Wang","year":"2023","journal-title":"Mach. Intell. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"107160","DOI":"10.1016\/j.patcog.2019.107160","article-title":"Few-shot traffic sign recognition with clustering inductive bias and random neural network","volume":"100","author":"Zhou","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bansal, A., Sikka, K., Sharma, G., Chellappa, R., and Divakaran, A. (2018, January 8\u201314). Zero-Shot Object Detection. Proceedings of the 15th European Conference, Munich, Germany.","DOI":"10.1007\/978-3-030-01246-5_24"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, Y., Dong, L., and He, T. (2024). A Closer Look at Few-Shot Object Detection. Pattern Recognition and Computer Vision, Springer.","DOI":"10.1007\/978-981-99-8543-2_35"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"19841","DOI":"10.1007\/s11042-023-16275-z","article-title":"Deep learning based object detection from multi-modal sensors: An overview","volume":"83","author":"Liu","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"7171","DOI":"10.1007\/s40747-023-01117-0","article-title":"Turning traffic surveillance cameras into intelligent sensors for traffic density estimation","volume":"9","author":"Hu","year":"2023","journal-title":"Complex. Intell. Syst."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/IJWSR.338222","article-title":"Predictive Analytics in Mental Health Leveraging LLM Embeddings and Machine Learning Models for Social Media Analysis","volume":"21","author":"Radwan","year":"2024","journal-title":"Int. J. Web Serv. Res. (IJWSR)"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1007\/s43762-021-00024-9","article-title":"Street life and pedestrian activities in smart cities: Opportunities and challenges for computational urban science","volume":"1","author":"Fan","year":"2021","journal-title":"Comput. Urban. Sci."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Masri, S., Ashqar, H.I., and Elhenawy, M. (2024). Leveraging Large Language Models (LLMs) for Traffic Management at Urban Intersections: The Case of Mixed Traffic Scenarios. arXiv.","DOI":"10.3390\/vehicles7010011"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Elhenawy, M., Ashqar, H.I., Masoud, M., Almannaa, M.H., Rakotonirainy, A., and Rakha, H.A. (2020). Deep transfer learning for vulnerable road users detection using smartphone sensors data. Remote Sens., 12.","DOI":"10.3390\/rs12213508"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, H., Yin, Z., Fan, C., and Wang, X. (2023, January 3\u20135). YOLO-MFE: Towards More Accurate Object Detection Using Multiscale Feature Extraction. Proceedings of the Sixth International Conference on Intelligent Computing, Communication, and Devices (ICCD 2023), Hong Kong, China.","DOI":"10.1117\/12.2682803"},{"key":"ref_31","first-page":"1","article-title":"Object Detection Using Open CV and Deep Learning","volume":"6","author":"Lokesh","year":"2022","journal-title":"Int. J. Sci. Res. Eng. Manag."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Nabati, R., and Qi, H. (2019). RRPN: Radar Region Proposal Network for Object Detection in Autonomous Vehicles. arXiv.","DOI":"10.1109\/ICIP.2019.8803392"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"18999","DOI":"10.1007\/s00521-023-08741-4","article-title":"Object detection in traffic videos: An optimized approach using super-resolution and maximal clique algorithm","volume":"35","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"12202","DOI":"10.1109\/TITS.2021.3110949","article-title":"The devil is in the details: An efficient convolutional neural network for transport mode detection","volume":"23","author":"Moreau","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1109\/TITS.2019.2910643","article-title":"Enhanced object detection with deep convolutional neural networks for advanced driving assistance","volume":"21","author":"Wei","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Nafaa, S., Ashour, K., Mohamed, R., Essam, H., Emad, D., Elhenawy, M., Ashqar, H.I., Hassan, A.A., and Alhadidi, T.I. (2024, January 13\u201314). Advancing Roadway Sign Detection with YOLO Models and Transfer Learning. Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mount Pleasant, MI, USA.","DOI":"10.1109\/ICMI60790.2024.10586105"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Nafaa, S., Ashour, K., Mohamed, R., Essam, H., Emad, D., Elhenawy, M., Ashqar, H.I., Hassan, A.A., and Alhadidi, T.I. (2024, January 13\u201314). Automated Pavement Cracks Detection and Classification Using Deep Learning. Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mount Pleasant, MI, USA.","DOI":"10.1109\/ICMI60790.2024.10586098"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lu, K., Zhao, F., Xu, X., and Zhang, Y. (2023). An object detection algorithm combining self-attention and YOLOv4 in traffic scene. PLoS ONE, 18.","DOI":"10.1371\/journal.pone.0285654"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, J., Jin, J., Ma, Y., and Ren, P. (2023). Lightweight object detection algorithm based on YOLOv5 for unmanned surface vehicles. Front. Mar. Sci., 9.","DOI":"10.3389\/fmars.2022.1058401"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"114640","DOI":"10.1109\/ACCESS.2021.3105190","article-title":"Road detection network based on anti-disturbance and variable-scale spatial context detector","volume":"9","author":"Ding","year":"2021","journal-title":"IEEE Access"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"211164","DOI":"10.1109\/ACCESS.2020.3036620","article-title":"Detection of road objects with small appearance in images for autonomous driving in various traffic situations using a deep learning based approach","volume":"8","author":"Li","year":"2020","journal-title":"IEEE Access"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"145182","DOI":"10.1109\/ACCESS.2020.3015251","article-title":"Autonomous Railway Traffic Object Detection Using Feature-Enhanced Single-Shot Detector","volume":"8","author":"Tao","year":"2020","journal-title":"IEEE Access"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Chen, Y., Duan, Z., and Yan, H. (2022, January 25\u201327). R-YOLOv5: A Lightweight Rotation Detector for Remote Sensing. Proceedings of the International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2022), Zhuhai, China.","DOI":"10.1117\/12.2641127"},{"key":"ref_44","first-page":"289","article-title":"\u2018Mass Centre\u2019 Vectorization Algorithm for Vehicle\u2019s Counting Portable Video System","volume":"17","author":"Gaidash","year":"2016","journal-title":"Transp. Telecommun. J."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Knapik, M., Cyganek, B., and Balon, T. (2024). Multimodal Driver Condition Monitoring System Operating in the Far-Infrared Spectrum. Electronics, 13.","DOI":"10.3390\/electronics13173502"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"100116","DOI":"10.1016\/j.commtr.2023.100116","article-title":"GPT-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models","volume":"4","author":"Liao","year":"2024","journal-title":"Commun. Transp. Res."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"de Curt\u00f2, J., de Zarz\u00e0, I., and Calafate, C.T. (2023). Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles. Drones, 7.","DOI":"10.3390\/drones7020114"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1007\/s11263-024-02214-4","article-title":"Contextual Object Detection with Multimodal Large Language Models","volume":"133","author":"Zang","year":"2024","journal-title":"Int. J. Comput. Vis."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Chi, F., Wang, Y., Nasiopoulos, P., and Leung, V.C.M. (2024, January 14\u201319). Multi-Modal GPT-4 Aided Action Planning and Reasoning for Self-driving Vehicles. Proceedings of the ICASSP 2024\u20142024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea.","DOI":"10.1109\/ICASSP48485.2024.10446745"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Cai, L., Cheng, X., Zhang, Q., Xue, X., Ding, W., and Pu, J. (2024). OpenAnnotate2: Multi-Modal Auto-Annotating for Autonomous Driving. IEEE Trans. Intell. Veh., 1\u201313.","DOI":"10.1109\/TIV.2024.3381602"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Mu, X., Qin, T., Zhang, S., Xu, C., and Yang, M. (2024, January 2\u20135). Pix2Planning: End-to-End Planning by Vision-language Model for Autonomous Driving on Carla Simulator. Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Island, Republic of Korea.","DOI":"10.1109\/IV55156.2024.10588479"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Sammoudi, M., Habaybeh, A., Ashqar, H.I., and Elhenawy, M. (2024). Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language. arXiv.","DOI":"10.1007\/978-3-031-82377-0_30"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Tami, M., Ashqar, H.I., and Elhenawy, M. (2024). Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques. arXiv.","DOI":"10.1007\/978-3-031-82377-0_24"},{"key":"ref_54","unstructured":"Rouzegar, H., and Makrehchi, M. (2024). Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"98254","DOI":"10.1109\/ACCESS.2024.3429396","article-title":"Autonomous Driving Roadway Feature Interpretation Using Integrated Semantic Analysis and Domain Adaptation","volume":"12","author":"Xi","year":"2024","journal-title":"IEEE Access"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"41999","DOI":"10.1109\/ACCESS.2024.3378248","article-title":"Drone-TOOD: A Lightweight Task-Aligned Object Detection Algorithm for Vehicle Detection in UAV Images","volume":"12","author":"Ou","year":"2024","journal-title":"IEEE Access"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Alaba, S.Y., Gurbuz, A.C., and Ball, J.E. (2024). Emerging Trends in Autonomous Vehicle Perception: Multimodal Fusion for 3D Object Detection. World Electr. Veh. J., 15.","DOI":"10.3390\/wevj15010020"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Murozi, A.-F.M., Ishak, S.Z., Nusa, F.N.M., Hoong, A.P.W., and Sulistyono, S. (2022, January 23). The application of international road assessment programme (irap) as a road infrastructure risk assessment tool. Proceedings of the 2022 IEEE 13th Control and System Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia.","DOI":"10.1109\/ICSGRC55096.2022.9845149"},{"key":"ref_59","unstructured":"World Health Organization (2015). Global Status Report on Road Safety 2015, World Health Organization."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Angelo, A.A., Sasai, K., and Kaito, K. (2023). Assessing critical road sections: A decision matrix approach considering safety and pavement condition. Sustainability, 15.","DOI":"10.3390\/su15097244"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Alhadidi, T.I., Jaber, A., Jaradat, S., Ashqar, H.I., and Elhenawy, M. (2024). Object Detection Using Oriented Window Learning Vision Transformer: Roadway Assets Recognition. arXiv, Available online: https:\/\/api.semanticscholar.org\/CorpusID:270560797.","DOI":"10.1007\/978-3-031-82377-0_41"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/MITS.2024.3381793","article-title":"Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles","volume":"16","author":"Cui","year":"2024","journal-title":"IEEE Intell. Transp. Syst. Mag."},{"key":"ref_63","unstructured":"Sha, H., Mu, Y., Jiang, Y., Chen, L., Xu, C., Luo, P., Li, S.E., Tomizuka, M., Zhan, W., and Ding, M. (2023). Languagempc: Large language models as decision makers for autonomous driving. arXiv."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Xu, Z., Zhang, Y., Xie, E., Zhao, Z., Guo, Y., Wong, K.-Y.K., Li, Z., and Zhao, H. (2023). Drivegpt4: Interpretable end-to-end autonomous driving via large language model. arXiv.","DOI":"10.1109\/LRA.2024.3440097"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Yu, X., and Marinov, M. (2020). A study on recent developments and issues with obstacle detection systems for automated vehicles. Sustainability, 12.","DOI":"10.3390\/su12083281"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"103754","DOI":"10.1016\/j.infrared.2021.103754","article-title":"Infrared machine vision and infrared thermography with deep learning: A review","volume":"116","author":"He","year":"2021","journal-title":"Infrared Phys. Technol."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1109\/TIV.2021.3122898","article-title":"A progressive review: Emerging technologies for ADAS driven solutions","volume":"7","author":"Nidamanuri","year":"2021","journal-title":"IEEE Trans. Intell. Veh."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"125459","DOI":"10.1109\/ACCESS.2020.3007481","article-title":"Thermal object detection in difficult weather conditions using YOLO","volume":"8","author":"Pobar","year":"2020","journal-title":"IEEE Access"},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1002\/rob.21985","article-title":"Object detection, recognition, and tracking from UAVs using a thermal camera","volume":"38","author":"Leira","year":"2021","journal-title":"J. Field Robot."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Kri\u0161to, M., and Iva\u0161i\u0107-Kos, M. (2019, January 20\u201324). Thermal imaging dataset for person detection. Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia.","DOI":"10.23919\/MIPRO.2019.8757208"},{"key":"ref_71","unstructured":"Yuksekgonul, M., Bianchi, F., Kalluri, P., Jurafsky, D., and Zou, J. (2023, January 1\u20135). When and why vision-language models behave like bags-of-words, and what to do about it?. Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda."},{"key":"ref_72","unstructured":"Liu, H., Xue, W., Chen, Y., Chen, D., Zhao, X., Wang, K., Hou, L., Li, R., and Peng, W. (2024). A survey on hallucination in large vision-language models. arXiv."},{"key":"ref_73","unstructured":"Zhang, J., Hu, J., Khayatkhoei, M., Ilievski, F., and Sun, M. (2024). Exploring perceptual limitation of multimodal large language models. arXiv."},{"key":"ref_74","unstructured":"Yang, S., Zhai, B., You, Q., Yuan, J., Yang, H., and Xu, C. (2024). Law of Vision Representation in MLLMs. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W.X., and Wen, J.-R. (2023). Evaluating object hallucination in large vision-language models. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.20"},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"2422","DOI":"10.3390\/smartcities7050095","article-title":"Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data","volume":"7","author":"Jaradat","year":"2024","journal-title":"Smart Cities"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Masri, S., Raddad, Y., Khandaqji, F., Ashqar, H.I., and Elhenawy, M. (2024). Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART. arXiv.","DOI":"10.1007\/978-3-031-82377-0_25"},{"key":"ref_78","unstructured":"Bai, G., Chai, Z., Ling, C., Wang, S., Lu, J., Zhang, N., Shi, T., Yu, Z., Zhu, M., and Zhang, Y. (2024). Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv."},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., and Stoica, I. (2023, January 23\u201326). Efficient memory management for large language model serving with pagedattention. Proceedings of the 29th Symposium on Operating Systems Principles, Koblenz, Germany.","DOI":"10.1145\/3600006.3613165"}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/6\/133\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:46:18Z","timestamp":1760031978000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/6\/133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,3]]},"references-count":79,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["computation13060133"],"URL":"https:\/\/doi.org\/10.3390\/computation13060133","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,3]]}}}