{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T07:38:51Z","timestamp":1769585931102,"version":"3.49.0"},"reference-count":68,"publisher":"Springer Science and Business Media LLC","issue":"22","license":[{"start":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T00:00:00Z","timestamp":1751328000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T00:00:00Z","timestamp":1751328000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,8]]},"DOI":"10.1007\/s00521-025-11435-8","type":"journal-article","created":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T16:02:22Z","timestamp":1751385742000},"page":"18067-18090","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["OneEncoder: a lightweight framework for efficient multimodal training"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7852-7003","authenticated-orcid":false,"given":"Bilal","family":"Faye","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hanane","family":"Azzag","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mustapha","family":"Lebbah","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Djamel","family":"Bouchaffra","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,7,1]]},"reference":[{"key":"11435_CR1","doi-asserted-by":"publisher","unstructured":"Bi X, Chen D, Chen G, Chen S, Dai D, Deng C, Ding H, Dong K, Du Q, Fu Z, et al. (2024) Deepseek LLM: scaling open-source language models with longtermism. CoRR abs\/2401.02954 https:\/\/doi.org\/10.48550\/ARXIV.2401.02954arXiv:2401.02954","DOI":"10.48550\/ARXIV.2401.02954"},{"key":"11435_CR2","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations"},{"key":"11435_CR3","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE\/CVF international conference on computer vision (ICCV), pp 9992\u201310002","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"11435_CR4","first-page":"12449","volume":"33","author":"A Baevski","year":"2020","unstructured":"Baevski A, Zhou Y, Mohamed A, Auli M (2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449\u201312460","journal-title":"Adv Neural Inf Process Syst"},{"key":"11435_CR5","doi-asserted-by":"publisher","first-page":"3256","DOI":"10.1109\/TASLP.2024.3417347","volume":"32","author":"Y Ai","year":"2024","unstructured":"Ai Y, Jiang X-H, Lu Y-X, Du H-P, Ling Z-H (2024) Apcodec: a neural audio codec with parallel amplitude and phase spectrum encoding and decoding. IEEE ACM Trans Audio Speech Lang Process 32:3256\u20133269","journal-title":"IEEE ACM Trans Audio Speech Lang Process"},{"key":"11435_CR6","unstructured":"Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, vol 139, pp 8748\u20138763"},{"key":"11435_CR7","unstructured":"Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-shot text-to-image generation. In: International conference on machine learning, Pmlr, vol 139, pp 8821\u20138831"},{"key":"11435_CR8","doi-asserted-by":"crossref","unstructured":"Guzhov A, Raue F, Hees J, Dengel A (2022) Audioclip: Extending clip to image, text and audio. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 976\u2013980","DOI":"10.1109\/ICASSP43922.2022.9747631"},{"key":"11435_CR9","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30 pp 5998\u20136008"},{"key":"11435_CR10","first-page":"23716","volume":"35","author":"J-B Alayrac","year":"2022","unstructured":"Alayrac J-B, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millican K, Reynolds M et al (2022) Flamingo: a visual language model for few-shot learning. Adv Neural Inf Process Syst 35:23716\u201323736","journal-title":"Adv Neural Inf Process Syst"},{"key":"11435_CR11","unstructured":"Zhang Y, Jiang H, Miura Y, Manning CD, Langlotz CP (2022) Contrastive learning of medical visual representations from paired images and text. In: Machine learning for healthcare conference, PMLR vol, 182 pp 2\u201325"},{"key":"11435_CR12","unstructured":"Jia C, Yang Y, Xia Y, Chen Y-T, Parekh Z, Pham H, Le Q, Sung Y-H, Li Z, Duerig T (2021) Scaling up visual and vision-language representation learning with noisy text supervision. In: International conference on machine learning, PMLR, vol 139, pp 4904\u20134916"},{"key":"11435_CR13","unstructured":"Radford A, Kim JW, Xu T, Brockman G, McLeavey C, Sutskever I (2023) Robust speech recognition via large-scale weak supervision. In: International conference on machine learning, 202, pp PMLR 28492\u201328518"},{"key":"11435_CR14","doi-asserted-by":"publisher","unstructured":"Yariv G, Gat I, Wolf L, Adi Y, Schwartz I (2023) Audiotoken: adaptation of text-conditioned diffusion models for audio-to-image generation. CoRR abs\/2305.13050 https:\/\/doi.org\/10.48550\/ARXIV.2305.13050arXiv:2305.13050","DOI":"10.48550\/ARXIV.2305.13050"},{"key":"11435_CR15","doi-asserted-by":"publisher","unstructured":"Jiang H, Zhang J, Huang R, Ge C, Ni Z, Lu J, Zhou J, Song S, Huang G (2022) Cross-modal adapter for text-video retrieval. CoRR abs\/2211.09623 https:\/\/doi.org\/10.48550\/ARXIV.2211.09623arXiv:2211.09623","DOI":"10.48550\/ARXIV.2211.09623"},{"key":"11435_CR16","doi-asserted-by":"crossref","unstructured":"Girdhar R, El-Nouby A, Liu Z, Singh M, Alwala KV, Joulin A, Misra I (2023) Imagebind: one embedding space to bind them all. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 15180\u201315190","DOI":"10.1109\/CVPR52729.2023.01457"},{"key":"11435_CR17","unstructured":"Wu S, Fei H, Qu L, Ji W, Chua T-S (2024) Next-gpt: Any-to-any multimodal LLM. In: Proceedings of the international conference on machine learning, pp 53366\u201353397"},{"key":"11435_CR18","doi-asserted-by":"crossref","unstructured":"Panagopoulou A, Xue L, Yu N, Li J, Li D, Joty S, Xu R, Savarese S, Xiong C, Niebles JC (2024) X-instructblip: a framework for aligning image, 3d, audio, video to LLMS and its emergent cross-modal reasoning. In: European conference on computer vision, Springer, pp 177\u2013197","DOI":"10.1007\/978-3-031-72995-9_11"},{"key":"11435_CR19","unstructured":"Zhang Y, Gong K, Zhang K, Li H, Qiao Y, Ouyang W, Yue X (2023) Meta-transformer: a unified framework for multimodal learning. arXiv preprint arXiv:2307.10802"},{"key":"11435_CR20","doi-asserted-by":"crossref","unstructured":"Han J, Gong K, Zhang Y, Wang J, Zhang K, Lin D, Qiao Y, Gao P, Yue X (2024) Onellm: one framework to align all modalities with language. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 26584\u201326595","DOI":"10.1109\/CVPR52733.2024.02510"},{"key":"11435_CR21","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT), pp 4171\u20134186"},{"key":"11435_CR22","first-page":"10078","volume":"35","author":"Z Tong","year":"2022","unstructured":"Tong Z, Song Y, Wang J, Wang L (2022) Videomae: masked autoencoders are data-efficient learners for self-supervised video pre-training. Adv Neural Inf Process Syst 35:10078\u201310093","journal-title":"Adv Neural Inf Process Syst"},{"key":"11435_CR23","unstructured":"Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450"},{"issue":"2","key":"11435_CR24","first-page":"691","volume":"39","author":"C-Y Wang","year":"2023","unstructured":"Wang C-Y, Yeh I-H, Liao H-YM (2023) You only learn one representation: unified network for multiple tasks. J Inf Sci Eng 39(2):691\u2013709","journal-title":"J Inf Sci Eng"},{"key":"11435_CR25","doi-asserted-by":"crossref","unstructured":"Wei X, Zhang T, Li Y, Zhang Y, Wu F (2020) Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 10938\u201310947","DOI":"10.1109\/CVPR42600.2020.01095"},{"key":"11435_CR26","unstructured":"Oord A, et al. (2018) Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748"},{"key":"11435_CR27","doi-asserted-by":"crossref","unstructured":"Desai K, Johnson J (2021) Virtex: Learning visual representations from textual annotations. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 11162\u201311173","DOI":"10.1109\/CVPR46437.2021.01101"},{"key":"11435_CR28","unstructured":"Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Doll\u00e1r P, Zitnick CL (2015) Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325"},{"key":"11435_CR29","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1162\/tacl_a_00166","volume":"2","author":"P Young","year":"2014","unstructured":"Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist 2:67\u201378","journal-title":"Trans Assoc Comput Linguist"},{"key":"11435_CR30","doi-asserted-by":"crossref","unstructured":"Sidorov O, Hu R, Rohrbach M, Singh A (2020) Textcaps: a dataset for image captioning with reading comprehension. In: Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part II 16, Springer, pp 742\u2013758","DOI":"10.1007\/978-3-030-58536-5_44"},{"key":"11435_CR31","doi-asserted-by":"crossref","unstructured":"Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 5206\u20135210","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"11435_CR32","doi-asserted-by":"crossref","unstructured":"Xu J, Mei T, Yao T, Rui Y (2016) MSR-VTT: a large video description dataset for bridging video and language. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5288\u20135296","DOI":"10.1109\/CVPR.2016.571"},{"key":"11435_CR33","unstructured":"Malinowski M, Fritz M (2014) A multi-world approach to question answering about real-world scenes based on uncertain input. Advances in neural information processing systems 27"},{"key":"11435_CR34","unstructured":"Krizhevsky A, Nair V, Hinton G (2010) Cifar-10 (Canadian institute for advanced research). 5(4):1. http:\/\/www.cs.toronto.edu\/kriz\/cifar.html"},{"key":"11435_CR35","doi-asserted-by":"crossref","unstructured":"Parkhi OM, Vedaldi A, Zisserman A, Jawahar C (2012) Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 3498\u20133505","DOI":"10.1109\/CVPR.2012.6248092"},{"issue":"1097\u20131105","key":"11435_CR36","first-page":"26","volume":"25","author":"A Krizhevsky","year":"2012","unstructured":"Krizhevsky A, Nair V (2012) Cifar-100 (Canadian institute for advanced research). 30 [65] alex krizhevsky, ilya sutskever, and geoffrey e hinton. imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(1097\u20131105):26","journal-title":"Adv Neural Inf Process Syst"},{"key":"11435_CR37","doi-asserted-by":"crossref","unstructured":"Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: 2004 conference on computer vision and pattern recognition workshop, IEEE, pp. 178\u2013178","DOI":"10.1109\/CVPR.2004.383"},{"issue":"7","key":"11435_CR38","first-page":"3","volume":"7","author":"Y Le","year":"2015","unstructured":"Le Y, Yang X (2015) Tiny imagenet visual recognition challenge. CS 231N 7(7):3","journal-title":"CS 231N"},{"key":"11435_CR39","doi-asserted-by":"crossref","unstructured":"Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631\u20131642","DOI":"10.18653\/v1\/D13-1170"},{"key":"11435_CR40","doi-asserted-by":"crossref","unstructured":"Voorhees EM, et al. (1999) The trec-8 question answering track report. In: Trec, 99:77\u201382","DOI":"10.6028\/NIST.SP.500-246.qa-overview"},{"key":"11435_CR41","doi-asserted-by":"crossref","unstructured":"Saravia E, Liu H-CT, Huang Y-C, Wu J, Chen Y (2018) Carer: contextualized affect representations for emotion recognition. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3687\u20133697","DOI":"10.18653\/v1\/D18-1404"},{"issue":"5","key":"11435_CR42","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1109\/TSA.2002.800560","volume":"10","author":"G Tzanetakis","year":"2002","unstructured":"Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293\u2013302","journal-title":"IEEE Trans Speech Audio Process"},{"key":"11435_CR43","doi-asserted-by":"crossref","unstructured":"Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on multimedia, pp 1041\u20131044","DOI":"10.1145\/2647868.2655045"},{"key":"11435_CR44","doi-asserted-by":"crossref","unstructured":"Piczak KJ (2015) Environmental sound classification with convolutional neural networks. Proceedings of the IEEE 25th international workshop on machine learning for signal processing (MLSP), pp 1\u20136","DOI":"10.1109\/MLSP.2015.7324337"},{"key":"11435_CR45","unstructured":"Chen D, Dolan WB (2011) Collecting highly parallel data for paraphrase evaluation. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp 190\u2013200"},{"key":"11435_CR46","unstructured":"Liu Y, Albanie S, Nagrani A, Zisserman A (2019) Use what you have: video retrieval using representations from collaborative experts. In: BMVC, pp 279"},{"key":"11435_CR47","unstructured":"Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: international conference on learning representations"},{"key":"11435_CR48","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"11435_CR49","unstructured":"Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, PMLR, vol 97 pp 6105\u20136114"},{"key":"11435_CR50","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized Bert approach. arXiv preprint arXiv:1907.11692"},{"key":"11435_CR51","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of Bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108"},{"key":"11435_CR52","unstructured":"Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alch\u00e9-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol 32, pp 5754\u20135764. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf"},{"key":"11435_CR53","doi-asserted-by":"crossref","unstructured":"Guzhov A, Raue F, Hees J, Dengel A (2021) Esresnet: environmental sound classification based on visual domain models. In: 2020 25th international conference on pattern recognition (ICPR), IEEE, pp 4933\u20134940","DOI":"10.1109\/ICPR48806.2021.9413035"},{"key":"11435_CR54","doi-asserted-by":"publisher","unstructured":"Gong Y, Chung Y-A, Glass J (2021) AST: audio spectrogram transformer. In: Proc. Interspeech 2021, pp 571\u2013575. https:\/\/doi.org\/10.21437\/Interspeech.2021-698","DOI":"10.21437\/Interspeech.2021-698"},{"key":"11435_CR55","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1016\/j.patrec.2022.07.012","volume":"161","author":"S Verbitskiy","year":"2022","unstructured":"Verbitskiy S, Berikov V, Vyshegorodtsev V (2022) ERANNS: efficient residual audio neural networks for audio pattern recognition. Pattern Recogn Letter 161:38\u201344","journal-title":"Pattern Recogn Letter"},{"key":"11435_CR56","doi-asserted-by":"crossref","unstructured":"Ma Y, Xu G, Sun X, Yan M, Zhang J, Ji R (2022) X-clip: end-to-end multi-grained contrastive learning for video-text retrieval. In: Proceedings of the 30th ACM international conference on multimedia, pp 638\u2013647","DOI":"10.1145\/3503161.3547910"},{"key":"11435_CR57","unstructured":"Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, vol. 119, pp 1597\u20131607"},{"key":"11435_CR58","unstructured":"Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297"},{"key":"11435_CR59","doi-asserted-by":"crossref","unstructured":"Amrani E, Ben-Ari R, Rotman D, Bronstein A (2021) Noise estimation using density estimation for self-supervised multimodal learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6644\u20136652","DOI":"10.1609\/aaai.v35i8.16822"},{"key":"11435_CR60","doi-asserted-by":"crossref","unstructured":"Bain M, Nagrani A, Varol G, Zisserman A (2021) Frozen in time: a joint video and image encoder for end-to-end retrieval. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1728\u20131738","DOI":"10.1109\/ICCV48922.2021.00175"},{"key":"11435_CR61","doi-asserted-by":"crossref","unstructured":"Gabeur V, Sun C, Alahari K, Schmid C (2020) Multimodal transformer for video retrieval. In: Computer Vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Springer, Part IV 16, pp 214\u2013229","DOI":"10.1007\/978-3-030-58548-8_13"},{"key":"11435_CR62","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1016\/j.neucom.2022.07.028","volume":"508","author":"H Luo","year":"2022","unstructured":"Luo H, Ji L, Zhong M, Chen Y, Lei W, Duan N, Li T (2022) Clip4clip: an empirical study of clip for end to end video clip retrieval and captioning. Neurocomputing 508:293\u2013304","journal-title":"Neurocomputing"},{"key":"11435_CR63","doi-asserted-by":"crossref","unstructured":"Dzabraev M, Kalashnikov M, Komkov S, Petiushko A (2021) Mdmmt: Multidomain multimodal transformer for video retrieval. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3354\u20133363","DOI":"10.1109\/CVPRW53098.2021.00374"},{"key":"11435_CR64","unstructured":"Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? In: ICML, 2:4"},{"key":"11435_CR65","unstructured":"Bao H, Dong L, Piao S, Wei F (2022) Beit: Bert pre-training of image transformers. In: international conference on learning representations (ICLR)"},{"key":"11435_CR66","unstructured":"Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, J\u00e9gou H (2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning (ICML), vol 139, pp 10347\u201310357"},{"key":"11435_CR67","unstructured":"Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: a lite Bert for self-supervised learning of language representations. In: International conference on learning representations (ICLR)"},{"key":"11435_CR68","doi-asserted-by":"crossref","unstructured":"Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 133\u2013138","DOI":"10.3115\/981732.981751"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-025-11435-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-025-11435-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-025-11435-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T00:17:57Z","timestamp":1757204277000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-025-11435-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,1]]},"references-count":68,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["11435"],"URL":"https:\/\/doi.org\/10.1007\/s00521-025-11435-8","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,1]]},"assertion":[{"value":"10 December 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no Conflict of interest or Conflict of interest related to this work.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable, as this study does not involve any ethical issues or human\/animal subjects.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"All authors have reviewed and approved the manuscript for publication.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}