{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T15:16:31Z","timestamp":1777043791758,"version":"3.51.4"},"reference-count":41,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T00:00:00Z","timestamp":1700092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"GOETHE-University Frankfurt am Main; \u201cxAIBiology-Hessian.AI\u201d","award":["PID2021-122580NB-I00"],"award-info":[{"award-number":["PID2021-122580NB-I00"]}]},{"name":"R&amp;D project","award":["PID2021-122580NB-I00"],"award-info":[{"award-number":["PID2021-122580NB-I00"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the rise in traffic congestion in urban centers, predicting accidents has become paramount for city planning and public safety. This work comprehensively studied the efficacy of modern deep learning (DL) methods in forecasting traffic accidents and enhancing Level-4 and Level-5 (L-4 and L-5) driving assistants with actionable visual and language cues. Using a rich dataset detailing accident occurrences, we juxtaposed the Transformer model against traditional time series models like ARIMA and the more recent Prophet model. Additionally, through detailed analysis, we delved deep into feature importance using principal component analysis (PCA) loadings, uncovering key factors contributing to accidents. We introduce the idea of using real-time interventions with large language models (LLMs) in autonomous driving with the use of lightweight compact LLMs like LLaMA-2 and Zephyr-7b-\u03b1. Our exploration extends to the realm of multimodality, through the use of Large Language-and-Vision Assistant (LLaVA)\u2014a bridge between visual and linguistic cues by means of a Visual Language Model (VLM)\u2014in conjunction with deep probabilistic reasoning, enhancing the real-time responsiveness of autonomous driving systems. In this study, we elucidate the advantages of employing large multimodal models within DL and deep probabilistic programming for enhancing the performance and usability of time series forecasting and feature weight importance, particularly in a self-driving scenario. This work paves the way for safer, smarter cities, underpinned by data-driven decision making.<\/jats:p>","DOI":"10.3390\/s23229225","type":"journal-article","created":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T08:19:43Z","timestamp":1700122783000},"page":"9225","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":88,"title":["LLM Multimodal Traffic Accident Forecasting"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5844-7871","authenticated-orcid":false,"given":"I.","family":"de Zarz\u00e0","sequence":"first","affiliation":[{"name":"Informatik und Mathematik, GOETHE-University Frankfurt am Main, 60323 Frankfurt am Main, Germany"},{"name":"Departamento de Inform\u00e1tica de Sistemas y Computadores, Universitat Polit\u00e8cnica de Val\u00e8ncia, 46022 Val\u00e8ncia, Spain"},{"name":"Estudis d\u2019Inform\u00e0tica, Multim\u00e8dia i Telecomunicaci\u00f3, Universitat Oberta de Catalunya, 08018 Barcelona, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8334-4719","authenticated-orcid":false,"given":"J.","family":"de Curt\u00f2","sequence":"additional","affiliation":[{"name":"Informatik und Mathematik, GOETHE-University Frankfurt am Main, 60323 Frankfurt am Main, Germany"},{"name":"Departamento de Inform\u00e1tica de Sistemas y Computadores, Universitat Polit\u00e8cnica de Val\u00e8ncia, 46022 Val\u00e8ncia, Spain"},{"name":"Estudis d\u2019Inform\u00e0tica, Multim\u00e8dia i Telecomunicaci\u00f3, Universitat Oberta de Catalunya, 08018 Barcelona, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6439-8076","authenticated-orcid":false,"given":"Gemma","family":"Roig","sequence":"additional","affiliation":[{"name":"Informatik und Mathematik, GOETHE-University Frankfurt am Main, 60323 Frankfurt am Main, Germany"},{"name":"HESSIAN Center for AI (hessian.AI), 64289 Darmstadt, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5729-3041","authenticated-orcid":false,"given":"Carlos T.","family":"Calafate","sequence":"additional","affiliation":[{"name":"Departamento de Inform\u00e1tica de Sistemas y Computadores, Universitat Polit\u00e8cnica de Val\u00e8ncia, 46022 Val\u00e8ncia, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Guo, Y., Tang, Z., and Guo, J. (2020). Could a Smart City Ameliorate Urban Traffic Congestion? A Quasi-Natural Experiment Based on a Smart City Pilot Program in China. Sustainability, 12.","DOI":"10.3390\/su12062291"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"106485","DOI":"10.1016\/j.cie.2020.106485","article-title":"Modeling uncertainties based on data mining approach in emergency service resource allocation","volume":"145","author":"Zonouzi","year":"2020","journal-title":"Comput. Ind. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.trc.2014.01.005","article-title":"Short-term traffic forecasting: Where we are and where we\u2019re going","volume":"43","author":"Vlahogianni","year":"2014","journal-title":"Transp. Res. Part Emerg. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"109670","DOI":"10.1016\/j.patcog.2023.109670","article-title":"A Decomposition Dynamic graph convolutional recurrent network for traffic forecasting","volume":"142","author":"Weng","year":"2023","journal-title":"Pattern Recognit."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"22788","DOI":"10.1109\/ACCESS.2023.3249144","article-title":"Driver Behavior Modeling Towards Autonomous Vehicles: Comprehensive Review","volume":"11","author":"Negash","year":"2023","journal-title":"IEEE Access"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"117921","DOI":"10.1016\/j.eswa.2022.117921","article-title":"Graph neural network for traffic forecasting: A survey","volume":"207","author":"Jiang","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jiang, W., Luo, J., He, M., and Gu, W. (2023). Graph Neural Network for Traffic Forecasting: The Research Progress. ISPRS Int. J. Geo-Inf., 12.","DOI":"10.3390\/ijgi12030100"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"10967","DOI":"10.1109\/TKDE.2022.3233086","article-title":"A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting","volume":"35","author":"Li","year":"2022","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_9","first-page":"100550","article-title":"Study on mixed traffic of autonomous vehicles and human-driven vehicles with different cyber interaction approaches","volume":"39","author":"Guo","year":"2023","journal-title":"Veh. Commun."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"102063","DOI":"10.1016\/j.inffus.2023.102063","article-title":"Towards integrated and fine-grained traffic forecasting: A spatio-temporal heterogeneous graph transformer approach","volume":"102","author":"Li","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Chen, H., Wang, T., Chen, T., and Deng, W. (2023). Hyperspectral image classification based on fusing S3-PCA, 2D-SSA and random patch network. Remote Sens., 15.","DOI":"10.3390\/rs15133402"},{"key":"ref_12","unstructured":"Pham, H., Dai, Z., Ghiasi, G., Kawaguchi, K., Liu, H., Yu, A.W., Yu, J., Chen, Y., Luong, M., and Wu, Y. (2021). Combined scaling for open-vocabulary image classification. arXiv."},{"key":"ref_13","unstructured":"Peng, B., Li, C., He, P., Galley, M., and Gao, J. (2023). Instruction tuning with GPT-4. arXiv."},{"key":"ref_14","unstructured":"Li, J., Li, D., Savarese, S., and Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv."},{"key":"ref_15","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Minderer, M., Gritsenko, A., Austin, N.M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., Shen, Z., and Wang, X. (2022). Simple Open-Vocabulary Object Detection with Vision Transformers. arXiv.","DOI":"10.1007\/978-3-031-20080-9_42"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Minderer, M., Gritsenko, A., and Houlsby, N. (2023). Scaling Open-Vocabulary Object Detection. arXiv.","DOI":"10.1007\/978-3-031-20080-9_42"},{"key":"ref_18","first-page":"1","article-title":"Pyro: Deep Universal Probabilistic Programming","volume":"20","author":"Bingham","year":"2018","journal-title":"J. Mach. Learn. Res."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ahangar, M.N., Ahmed, Q.Z., Khan, F.A., and Hafeez, M. (2021). A survey of autonomous vehicles: Enabling communication technologies and challenges. Sensors, 21.","DOI":"10.3390\/s21030706"},{"key":"ref_20","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (, January 18\u201324). Nuscenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yeong, D.J., Velasco-Hernandez, G., Barry, J., and Walsh, J. (2021). Sensor and sensor fusion technology in autonomous vehicles: A review. Sensors, 21.","DOI":"10.20944\/preprints202102.0459.v1"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hu, Y., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., and Wang, W. (2023, January 18\u201322). Planning-oriented autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01712"},{"key":"ref_23","unstructured":"Mao, J., Qian, Y., Zhao, H., and Wang, Y. (2023). GPT-Driver: Learning to Drive with GPT. arXiv."},{"key":"ref_24","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Shumway, R.H., and Stoffer, D.S. (2017). Time Series Analysis and Its Applications: With R Examples, Springer.","DOI":"10.1007\/978-3-319-52452-8"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1080\/00031305.2017.1380080","article-title":"Forecasting at Scale","volume":"72","author":"Taylor","year":"2018","journal-title":"Am. Stat."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Miao, Y., Bai, X., Cao, Y., Liu, Y., Dai, F., Wang, F., Qi, L., and Dou, W. (2023). A Novel Short-Term Traffic Prediction Model based on SVD and ARIMA with Blockchain in Industrial Internet of Things. IEEE Internet Things J.","DOI":"10.1109\/JIOT.2023.3283611"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zaman, M., Saha, S., and Abdelwahed, S. (2023, January 21\u201323). Assessing the Suitability of Different Machine Learning Approaches for Smart Traffic Mobility. Proceedings of the 2023 IEEE Transportation Electrification Conference & Expo (ITEC), Detroit, MI, USA.","DOI":"10.1109\/ITEC55900.2023.10186901"},{"key":"ref_29","unstructured":"Nguyen, N.-L., Vo, H.-T., Lam, G.-H., Nguyen, T.-B., and Do, T.-H. (2022). International Conference on Intelligence of Things, Springer International Publishing."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"6913","DOI":"10.1109\/TNNLS.2022.3183903","article-title":"Bidirectional spatial-temporal adaptive transformer for Urban traffic flow forecasting","volume":"34","author":"Chen","year":"2022","journal-title":"IEEE Trans. Neural Net. Learn. Syst."},{"key":"ref_31","unstructured":"Rob, J. (2018). Forecasting: Principles and Practice, OTexts. Available online: https:\/\/otexts.com\/fpp2\/."},{"key":"ref_32","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N.A., Khashabi, D., and Hajishirzi, H. (2022). Self-instruct: Aligning language model with self generated instructions. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"ref_34","unstructured":"Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., Casas, D.d., Bressand, F., Lengyel, G., Lample, G., and Saulnier, L. (2023). Mistral 7B. arXiv."},{"key":"ref_35","unstructured":"Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D., and Finn, C. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv."},{"key":"ref_36","unstructured":"Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., and Hashimoto, T.B. (2023, October 01). Stanford Alpaca: An Instruction-Following Llama Model. Available online: https:\/\/github.com\/tatsu-lab\/stanford_alpaca."},{"key":"ref_37","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). Llama: Open and efficient foundation language models. arXiv."},{"key":"ref_38","unstructured":"Liu, H., Li, C., Wu, Q., and Lee, Y.J. (2023, January 10\u201316). Visual instruction tuning. Proceedings of the NeurIPS 2023, New Orleans, LA, USA."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"101968","DOI":"10.1016\/j.inffus.2023.101968","article-title":"When CLIP meets cross-modal hashing retrieval: A new strong baseline","volume":"100","author":"Xia","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_40","unstructured":"Phan, D., Pradhan, N., and Jankowiak, M. (2019). Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro. arXiv."},{"key":"ref_41","unstructured":"Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021, January 18\u201324). Zero-Shot Text-to-Image Generation. Proceedings of the 38th International Conference on Machine Learning, Virtual."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9225\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:24:12Z","timestamp":1760131452000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9225"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,16]]},"references-count":41,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["s23229225"],"URL":"https:\/\/doi.org\/10.3390\/s23229225","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,16]]}}}