{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T06:41:49Z","timestamp":1776494509348,"version":"3.51.2"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T00:00:00Z","timestamp":1773014400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T00:00:00Z","timestamp":1773014400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Korea Advanced Institute of Science and Technology"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2026,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Large-scale vision-language models demonstrate strong multimodal alignment and generalization across diverse tasks. Among them, CLIP stands out as one of the most successful approaches. In this work, we extend the application of CLIP to sound source localization, proposing a self-supervised method operates without explicit text input. We introduce a framework that maps audios into tokens compatible with CLIP\u2019s text encoder, producing audio-driven embeddings. These embeddings are used to generate sounding region masks, from which visual features are extracted and aligned with the audio embeddings through a contrastive audio-visual correspondence objective. Our findings show that alignment knowledge of pre-trained multimodal foundation model enables our method to generate more complete and compact localization for sounding objects. We further propose an LLM-guided extension that distills object-aware audio-visual scene understanding into the model during training to enhance alignment. Extensive experiments across five diverse tasks demonstrate that our method, in all variants, outperforms state-of-the-art approaches and achieves strong generalization in zero-shot settings.<\/jats:p>","DOI":"10.1007\/s11263-025-02687-x","type":"journal-article","created":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T17:32:36Z","timestamp":1773077556000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization"],"prefix":"10.1007","volume":"134","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-4138-3561","authenticated-orcid":false,"given":"Sooyoung","family":"Park","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arda","family":"Senocak","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7741-7275","authenticated-orcid":false,"given":"Joon Son","family":"Chung","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,3,9]]},"reference":[{"key":"2687_CR1","doi-asserted-by":"crossref","unstructured":"Arandjelovi\u0107, R., & Zisserman, A. (2018). Objects that sound. In: European Conference on Computer Vision (ECCV).","DOI":"10.1007\/978-3-030-01246-5_27"},{"key":"2687_CR2","doi-asserted-by":"crossref","unstructured":"Bhati, S., Villalba, J., Moro-Velazquez, L., Thebaud, T., & Dehak, N. (2023). Segmental speechclip: Utilizing pretrained image-text models for audio-visual learning. In: INTERSPEECH.","DOI":"10.21437\/Interspeech.2023-135"},{"key":"2687_CR3","doi-asserted-by":"crossref","unstructured":"Cha, J., Mun, J., & Roh, B. (2023). Learning to generate text-grounded mask for open-world semantic segmentation from only image-text pairs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52729.2023.01074"},{"key":"2687_CR4","doi-asserted-by":"crossref","unstructured":"Chen, H., Xie, W., Vedaldi, A., & Zisserman, A. (2020). Vggsound: A large-scale audio-visual dataset. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","DOI":"10.1109\/ICASSP40776.2020.9053174"},{"key":"2687_CR5","doi-asserted-by":"crossref","unstructured":"Chen, H., Xie, W., Afouras, T., Nagrani, A., Vedaldi, A., & Zisserman, A. (2021). Localizing visual sounds the hard way. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR46437.2021.01659"},{"key":"2687_CR6","unstructured":"Chen, S., Wu, Y., Wang, C., Liu, S., Tompkins, D., Chen, Z., & Wei, F. (2023). BEATs: Audio pre-training with acoustic tokenizers. In: International Conference on Machine Learning (ICML)."},{"key":"2687_CR7","doi-asserted-by":"crossref","unstructured":"Chen, Y., Liu, Y., Wang, H., Liu, F., Wang, C., Frazer, H., & Carneiro, G. (2024). Unraveling instance associations: A closer look for audio-visual segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52733.2024.02502"},{"key":"2687_CR8","unstructured":"Dong, H. W., Takahashi, N., Mitsufuji, Y., McAuley, J., & Berg-Kirkpatrick, T. (2022). Clipsep: Learning text-queried sound separation with noisy unlabeled videos. In: International Conference on Learning Representations (ICLR)."},{"key":"2687_CR9","unstructured":"Fan, Y., Wu, Y., Lin, Y., & Du, B. (2023). Revisit weakly-supervised audio-visual video parsing from the language perspective. arXiv preprint arXiv:2306.00595."},{"key":"2687_CR10","doi-asserted-by":"crossref","unstructured":"Fedorishin, D., Mohan, D. D., Jawade, B., Setlur, S., & Govindaraju, V. (2023). Hear the flow: Optical flow-based self-supervised visual sound source localization. In: IEEE Winter Conference on Applications of Computer Vision (WACV).","DOI":"10.1109\/WACV56688.2023.00231"},{"key":"2687_CR11","doi-asserted-by":"crossref","unstructured":"Ghosh, S., Kumar, S., Seth, A., Evuru, C.K.R., Tyagi, U., Sakshi, S., Nieto, O., Duraiswami, R., & Manocha, D. (2024). Gama: A large audio-language model with advanced audio understanding and complex reasoning abilities. arXiv preprint arXiv:2406.11768.","DOI":"10.18653\/v1\/2024.emnlp-main.361"},{"key":"2687_CR12","doi-asserted-by":"crossref","unstructured":"Girdhar, R., El-Nouby, A., Liu, Z., Singh, M., Alwala, K. V., Joulin, A., & Misra, I. (2023). Imagebind: One embedding space to bind them all. (pp. 15180\u201315190) In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52729.2023.01457"},{"key":"2687_CR13","doi-asserted-by":"crossref","unstructured":"Guzhov, A., Raue, F., Hees, J., & Dengel, A. (2022). AudioCLIP: Extending CLIP to image, text and audio. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","DOI":"10.1109\/ICASSP43922.2022.9747631"},{"key":"2687_CR14","doi-asserted-by":"crossref","unstructured":"Hamilton, M., Zisserman, A., Hershey, J. R., & Freeman, W. T. (2024). Separating the chirp from the chat: Self-supervised visual grounding of sound and language. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52733.2024.01246"},{"key":"2687_CR15","unstructured":"Hu, D., Qian, R., Jiang, M,. Tan, X., Wen, S., Ding, E., Lin, W., & Dou, D. (2020). Discriminative sounding objects localization via self-supervised audiovisual matching. Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"2687_CR16","doi-asserted-by":"crossref","unstructured":"Hu, X., Chen, Z., & Owens, A. (2022). Mix and localize: Localizing sound sources in mixtures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.01023"},{"key":"2687_CR17","doi-asserted-by":"crossref","unstructured":"Hu, X., Chen, Z., & Owens, A. (2022). Mix and localize: Localizing sound sources in mixtures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.01023"},{"key":"2687_CR18","unstructured":"Jang, E., Gu, S., & Poole, B. (2017). Categorical reparameterization with gumbel-softmax. In: International Conference on Learning Representations (ICLR)."},{"key":"2687_CR19","unstructured":"Jia, C., Yang, Y., Xia, Y., Chen, Y. T., Parekh, Z., Pham, H., Le, Q., Sung, Y. H., Li, Z., & Duerig, T. (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning (ICML)."},{"key":"2687_CR20","unstructured":"Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de\u00a0Las\u00a0Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L.R., Lachaux, M.A., Stock, P., Scao, T.L., Lavril, T., Wang, T., Lacroix, T., & Sayed, W.E. (2023). Mistral 7b. arXiv preprint arXiv:2310.06825."},{"key":"2687_CR21","doi-asserted-by":"crossref","unstructured":"Kim, D., Um, S. J., Lee, S., & Kim, J. U. (2024). Learning to visually localize sound sources from mixtures without prior source knowledge. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52733.2024.02499"},{"key":"2687_CR22","unstructured":"Li, J., Li, D., Savarese, S., & Hoi, S. (2023). Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In: International Conference on Machine Learning (ICML)."},{"key":"2687_CR23","doi-asserted-by":"crossref","unstructured":"Li, J., Zhao, W., Huang, Z., Guo, Y., & Tian, Y. (2025). Do audio-visual segmentation models truly segment sounding objects? arXiv preprint arXiv:2502.00358.","DOI":"10.1609\/aaai.v40i8.37542"},{"key":"2687_CR24","doi-asserted-by":"crossref","unstructured":"Li, L.H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y., Wang, L., Yuan, L., Zhang, L., Hwang, J.N., Chang, K.W., & Gao, J. (2022). Grounded language-image pre-training. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.01069"},{"key":"2687_CR25","doi-asserted-by":"crossref","unstructured":"Li, S., Tian, Y., & Xu, C. (2021). Space-time memory network for sounding object localization in videos. In: British Machine Vision Conference (BMVC).","DOI":"10.5244\/C.35.116"},{"key":"2687_CR26","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2022.103602","volume-title":"Unsupervised sound localization via iterative contrastive learning","author":"YB Lin","year":"2023","unstructured":"Lin, Y. B., Tseng, H. Y., Lee, H. Y., Lin, Y. Y., & Yang, M. H. (2023). Unsupervised sound localization via iterative contrastive learning. Computer Vision and Image Understanding (CVIU)."},{"key":"2687_CR27","doi-asserted-by":"crossref","unstructured":"Liu, J., Ju, C., Xie, W., & Zhang, Y. (2022). Exploiting transformation invariance and equivariance for self-supervised sound localisation. In: ACM MM.","DOI":"10.1145\/3503161.3548317"},{"key":"2687_CR28","doi-asserted-by":"crossref","unstructured":"L\u00fcddecke, T., & Ecker, A. (2022). Image segmentation using text and image prompts. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.00695"},{"key":"2687_CR29","doi-asserted-by":"crossref","unstructured":"Mahmud, T., & Marculescu, D. (2023). Ave-clip: Audioclip-based multi-window temporal transformer for audio visual event localization. In: IEEE Winter Conference on Applications of Computer Vision (WACV).","DOI":"10.1109\/WACV56688.2023.00513"},{"key":"2687_CR30","doi-asserted-by":"crossref","unstructured":"Mahmud, T., Tian, Y., & Marculescu, D. (2024). T-vsl: Text-guided visual sound source localization in mixtures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52733.2024.02525"},{"key":"2687_CR31","doi-asserted-by":"crossref","unstructured":"Mo, S., & Morgado, P. (2022). A closer look at weakly-supervised audio-visual source localization. Advances in Neural Information Processing Systems (NeurIPS).","DOI":"10.52202\/068431-2720"},{"key":"2687_CR32","doi-asserted-by":"crossref","unstructured":"Mo, S., & Morgado, P. (2022). Localizing visual sounds the easy way. In: European Conference on Computer Vision (ECCV).","DOI":"10.1007\/978-3-031-19836-6_13"},{"key":"2687_CR33","doi-asserted-by":"crossref","unstructured":"Mo, S., & Tian, Y. (2023). Audio-visual grouping network for sound localization from mixtures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52729.2023.01018"},{"key":"2687_CR34","doi-asserted-by":"crossref","unstructured":"Nukrai, D., Mokady, R., & Globerson, A. (2022). Text-only training for image captioning using noise-injected clip. In: Empirical Methods in Natural Language Processing (EMNLP).","DOI":"10.18653\/v1\/2022.findings-emnlp.299"},{"key":"2687_CR35","doi-asserted-by":"crossref","unstructured":"Oya, T., Iwase, S., Natsume, R., Itazuri, T., Yamaguchi, S., & Morishima, S. (2020). Do we need sound for sound source localization? In: Asia Conference on Computer Vision (ACCV).","DOI":"10.1007\/978-3-030-69544-6_8"},{"key":"2687_CR36","doi-asserted-by":"crossref","unstructured":"Park, S., Senocak, A., & Chung, J. S. (2023). MarginNCE: Robust sound localization with a negative margin. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","DOI":"10.1109\/ICASSP49357.2023.10097234"},{"key":"2687_CR37","doi-asserted-by":"crossref","unstructured":"Park, S., Senocak, A., & Chung, J. S. (2024). Can clip help sound source localization? In: IEEE Winter Conference on Applications of Computer Vision (WACV).","DOI":"10.1109\/WACV57701.2024.00561"},{"key":"2687_CR38","doi-asserted-by":"crossref","unstructured":"Qian, R., Hu, D., Dinkel, H., Wu, M., Xu, N., & Lin, W. (2020). Multiple sound sources localization from coarse to fine. In: European Conference on Computer Vision (ECCV).","DOI":"10.1007\/978-3-030-58565-5_18"},{"key":"2687_CR39","unstructured":"Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners."},{"key":"2687_CR40","unstructured":"Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (ICML)."},{"key":"2687_CR41","doi-asserted-by":"crossref","unstructured":"Ryu, H., Kim, S., Chung, J.S., & Senocak, A. (2025). Seeing speech and sound: Distinguishing and locating audios in visual scenes. arXiv preprint arXiv:2503.18880.","DOI":"10.1109\/CVPR52734.2025.01264"},{"key":"2687_CR42","doi-asserted-by":"crossref","unstructured":"Senocak, A., Oh, T. H., Kim, J., Yang, M. H., & Kweon, I. S. (2018). Learning to localize sound source in visual scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR.2018.00458"},{"issue":"5","key":"2687_CR43","doi-asserted-by":"publisher","first-page":"1605","DOI":"10.1109\/TPAMI.2019.2952095","volume":"43","author":"A Senocak","year":"2021","unstructured":"Senocak, A., Oh, T. H., Kim, J., Yang, M. H., & Kweon, I. S. (2021). Learning to localize sound sources in visual scenes: Analysis and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(5), 1605\u20131619.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"2687_CR44","doi-asserted-by":"crossref","unstructured":"Senocak, A., Ryu, H., Kim, J., & Kweon, I. S. (2022). Learning sound localization better from semantically similar samples. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","DOI":"10.1109\/ICASSP43922.2022.9747867"},{"key":"2687_CR45","doi-asserted-by":"crossref","unstructured":"Senocak, A., Ryu, H., Kim, J., & Kweon, I. S. (2022). Less can be more: Sound source localization with a classification model. In: IEEE Winter Conference on Applications of Computer Vision (WACV).","DOI":"10.1109\/WACV51458.2022.00065"},{"key":"2687_CR46","doi-asserted-by":"crossref","unstructured":"Senocak, A., Ryu, H., Kim, J., Oh, T. H., Pfister, H., & Chung, J. S. (2023). Sound source localization is all about cross-modal alignment. In: IEEE International Conference on Computer Vision (ICCV).","DOI":"10.1109\/ICCV51070.2023.00715"},{"key":"2687_CR47","doi-asserted-by":"crossref","unstructured":"Senocak, A., Ryu, H., Kim, J., Oh, T. H., Pfister, H., & Chung, J. S. (2025). Toward interactive sound source localization. Better align sight and sound! IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).","DOI":"10.1109\/TPAMI.2025.3573994"},{"key":"2687_CR48","unstructured":"Song, Z., Wang, Y., Fan, J., Tan, T., & Zhang, Z. (2022). Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"2687_CR49","doi-asserted-by":"crossref","unstructured":"Sun, W., Zhang, J., Wang, J., Liu, Z., Zhong, Y., Feng, T., Guo, Y., Zhang, Y., & Barnes, N. (2023). Learning audio-visual source localization via false negative aware contrastive learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52729.2023.00621"},{"key":"2687_CR50","unstructured":"Sung-Bin, K., Hyun-Bin, O., Lee, J., Senocak, A., Chung, J. S., & Oh, T. H. (2025). Avhbench: A cross-modal hallucination benchmark for audio-visual large language models. In: International Conference on Learning Representations (ICLR)."},{"key":"2687_CR51","doi-asserted-by":"crossref","unstructured":"Tan, R., Ray, A., Burns, A., Plummer, B. A., Salamon, J., Nieto, O., Russell, B., & Saenko, K. (2023). Language-guided audio-visual source separation via trimodal consistency. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52729.2023.01019"},{"key":"2687_CR52","doi-asserted-by":"crossref","unstructured":"Wu, H. H., Seetharaman, P., Kumar, K., & Bello, J. P. (2022). Wav2CLIP: Learning robust audio representations from CLIP. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","DOI":"10.31219\/osf.io\/r2vwf"},{"key":"2687_CR53","doi-asserted-by":"crossref","unstructured":"Wu, Y., Chen, K., Zhang, T., Hui, Y., Berg-Kirkpatrick, T., & Dubnov, S. (2023). Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","DOI":"10.1109\/ICASSP49357.2023.10095969"},{"key":"2687_CR54","doi-asserted-by":"crossref","unstructured":"Xie, J., Hou, X., Ye, K., & Shen, L. (2022). Clims: Cross language image matching for weakly supervised semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.00444"},{"key":"2687_CR55","doi-asserted-by":"crossref","unstructured":"Xuan, H., Wu, Z., Yang, J., Yan, Y., & Alameda-Pineda, X. (2022). A proposal-based paradigm for self-supervised sound source localization in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52688.2022.00110"},{"key":"2687_CR56","doi-asserted-by":"crossref","unstructured":"Yariv, G., Gat, I., Wolf, L., Adi, Y., & Schwartz, I. (2023). AudioToken: Adaptation of text-conditioned diffusion models for audio-to-image generation. In: INTERSPEECH.","DOI":"10.21437\/Interspeech.2023-852"},{"key":"2687_CR57","doi-asserted-by":"crossref","unstructured":"Zhou, J., Wang, J., Zhang, J., Sun, W., Zhang, J., Birchfield, S., Guo, D., Kong, L., Wang, M., & Zhong, Y. (2022). Audio-visual segmentation. In: European Conference on Computer Vision (ECCV).","DOI":"10.1007\/978-3-031-19836-6_22"}],"updated-by":[{"DOI":"10.1007\/s11263-026-02836-w","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T00:00:00Z","timestamp":1775865600000}}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02687-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02687-x","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02687-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T05:47:28Z","timestamp":1776491248000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02687-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,9]]},"references-count":57,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2026,4]]}},"alternative-id":["2687"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02687-x","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,9]]},"assertion":[{"value":"3 May 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 April 2026","order":5,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":6,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The original online version of this article was revised due to update in authors affiliations and few typo in the equations.","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 April 2026","order":8,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":9,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":10,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1007\/s11263-026-02836-w","URL":"https:\/\/doi.org\/10.1007\/s11263-026-02836-w","order":11,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"179"}}