{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:35:42Z","timestamp":1761176142289,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Vision-Language Pretrained Models (VLPs) have shown remarkable success in transferring knowledge to various downstream tasks, ranging from image-level tasks such as classification to pixel-level tasks such as Semantic Segmentation. However, a persistent challenge in the latter dense prediction tasks is the misalignment between pixel and text features. This mismatch hinders effective fusion between visual and textual representations, and leads to suboptimal predictions. While some studies attribute this to a Modality Gap, where vision and language modalities form distinct clusters within the shared feature space, we argue that the key issue is semantic misalignment, where the pixel features do not accurately reflect the concepts encoded by the text features. To achieve a stronger semantic alignment between pixels and text embeddings, in this work we propose a Mask-Text Contrastive (MTC) module that explicitly enforces an alignment between image regions and their corresponding semantic concepts. This is achieved by projecting both pixel and text features into a common space where an InfoNCE-based loss promotes semantic correspondence, reducing the modality gap as a side effect. Our approach can be seamlessly integrated into state-of-the-art VLP-based segmentation architectures, requiring only a lightweight linear projection and introducing minimal computational overhead at inference time. Experiments show that the MTC module consistently improves segmentation performance in benchmarks such as ADE20K, COCO-Stuff 10k and Pascal Context. Further experiments with COCO show that MTC is also effective in other downstream dense tasks such as object detection and instance segmentation. The repository associated with this work is available at https:\/\/github.com\/fedasaro62\/mask-text-contrastive-fully-seg.<\/jats:p>","DOI":"10.3233\/faia250887","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:45:10Z","timestamp":1761126310000},"source":"Crossref","is-referenced-by-count":0,"title":["Investigating Mask-Text Contrastive Alignment in Semantic Segmentation"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-8727-3393","authenticated-orcid":false,"given":"Federico","family":"D\u2019Asaro","sequence":"first","affiliation":[{"name":"LINKS Foundation \u2013 AI, Data & Space (ADS)"},{"name":"Politecnico di Torino \u2013 Dipartimento di Automatica e Informatica (DAUIN)"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8894-5089","authenticated-orcid":false,"given":"Andrea","family":"Bottino","sequence":"additional","affiliation":[{"name":"Politecnico di Torino \u2013 Dipartimento di Automatica e Informatica (DAUIN)"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0083-813X","authenticated-orcid":false,"given":"Giuseppe","family":"Rizzo","sequence":"additional","affiliation":[{"name":"LINKS Foundation \u2013 AI, Data & Space (ADS)"},{"name":"Politecnico di Torino \u2013 Dipartimento di Automatica e Informatica (DAUIN)"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA250887","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:45:10Z","timestamp":1761126310000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA250887"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia250887","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}