{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:50:07Z","timestamp":1773802207693,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent representations and thereby improve the quality of image reconstruction and generation.\nThese methods employ a locally supervised approach for semantic supervision, which limits the uniformity of semantic distribution. However, VA-VAE proves that a more uniform feature distribution yields better generation performance.\nIn this work, we introduce a Global Perspective Tokenizer (GloTok), which utilizes global relational information to model a more uniform semantic distribution of tokenized features.\nSpecifically, a codebook-wise histogram relation learning method is proposed to transfer the semantics, which are modeled by pre-trained models on the entire dataset, to the semantic codebook.\nThen, we design a residual learning module which recovers the fine-grained details to minimize the reconstruction error caused by quantization.\nThrough the above design, GloTok delivers more uniformly distributed semantic latent representations, which facilitates the training of autoregressive (AR) models for generating high-quality images without requiring direct access to pre-trained models during the training process.\nExperiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.<\/jats:p>","DOI":"10.1609\/aaai.v40i16.38330","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:26:26Z","timestamp":1773793586000},"page":"13280-13288","source":"Crossref","is-referenced-by-count":0,"title":["GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation"],"prefix":"10.1609","volume":"40","author":[{"given":"Xuan","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Zhongyu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yuge","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Yuxi","family":"Mi","sequence":"additional","affiliation":[]},{"given":"Guodong","family":"Mu","sequence":"additional","affiliation":[]},{"given":"Shouhong","family":"Ding","sequence":"additional","affiliation":[]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Rizen","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Shuigeng","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38330\/42292","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38330\/42292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:26:26Z","timestamp":1773793586000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38330"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i16.38330","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}