{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T06:07:40Z","timestamp":1784268460993,"version":"3.55.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["PID2022-141539NB-I00"],"award-info":[{"award-number":["PID2022-141539NB-I00"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["FPU20\/02340"],"award-info":[{"award-number":["FPU20\/02340"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010067","name":"Gobierno de Arag\u00f3n","doi-asserted-by":"publisher","award":["Graphics and Imaging Lab (ref T34_23R)"],"award-info":[{"award-number":["Graphics and Imaging Lab (ref T34_23R)"]}],"id":[{"id":"10.13039\/501100010067","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010067","name":"Gobierno de Arag\u00f3n","doi-asserted-by":"publisher","award":["HUMAN-VR: Development of a Computational Model for Virtual Reality Perception PROY_T25_24"],"award-info":[{"award-number":["HUMAN-VR: Development of a Computational Model for Virtual Reality Perception PROY_T25_24"]}],"id":[{"id":"10.13039\/501100010067","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>Selection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for downstream editing tasks. We rely on vision transformer (ViT) models and leverage their features for selection, proposing a multi-resolution processing strategy that yields finer and more stable selection results than prior methods. Furthermore, we enable selection at two levels: texture and subtexture, leveraging a new two-level material selection (DuMaS) dataset which includes dense annotations for over 800,000 synthetic images, both on the texture and subtexture levels.<\/jats:p>","DOI":"10.1145\/3763332","type":"journal-article","created":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T17:15:39Z","timestamp":1764868539000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Fine-Grained Spatially Varying Material Selection in Images"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2077-683X","authenticated-orcid":false,"given":"Julia","family":"Guerrero-Viu","sequence":"first","affiliation":[{"name":"Universidad de Zaragoza - I3A, Zaragoza, Spain"},{"name":"Adobe Research, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2610-4831","authenticated-orcid":false,"given":"Michael","family":"Fischer","sequence":"additional","affiliation":[{"name":"Adobe Research, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9655-2138","authenticated-orcid":false,"given":"Iliyan","family":"Georgiev","sequence":"additional","affiliation":[{"name":"Adobe Research, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3509-8485","authenticated-orcid":false,"given":"Elena","family":"Garces","sequence":"additional","affiliation":[{"name":"Adobe Research, Paris, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7503-7022","authenticated-orcid":false,"given":"Diego","family":"Gutierrez","sequence":"additional","affiliation":[{"name":"Universidad de Zaragoza - I3A, Zaragoza, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0060-7278","authenticated-orcid":false,"given":"Belen","family":"Masia","sequence":"additional","affiliation":[{"name":"Universidad de Zaragoza - I3A, Zaragoza, Spain"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6219-3747","authenticated-orcid":false,"given":"Valentin","family":"Deschaintre","sequence":"additional","affiliation":[{"name":"Adobe Research, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2024. Evermotion Arch Interior. https:\/\/evermotion.org\/shop\/cat\/397\/archinteriors."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766967"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2462002"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298970"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.1998.710790"},{"key":"e_1_2_2_6_1","volume-title":"Contrastive lift: 3d object instance segmentation by slow-fast contrastive fusion. arXiv preprint arXiv:2306.04633","author":"Bhalgat Yash","year":"2023","unstructured":"Yash Bhalgat, Iro Laina, Joao F Henriques, Andrew Zisserman, and Andrea Vedaldi. 2023. Contrastive lift: 3d object instance segmentation by slow-fast contrastive fusion. arXiv preprint arXiv:2306.04633 (2023)."},{"key":"e_1_2_2_7_1","volume-title":"Window Attention is Bugged: How not to Interpolate Position Embeddings. arXiv preprint arXiv:2311.05613","author":"Bolya Daniel","year":"2023","unstructured":"Daniel Bolya, Chaitanya Ryali, Judy Hoffman, and Christoph Feichtenhofer. 2023. Window Attention is Bugged: How not to Interpolate Position Embeddings. arXiv preprint arXiv:2311.05613 (2023)."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1167\/3.8.2"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_2_2_10_1","volume-title":"Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836","author":"Cimpoi Mircea","year":"2014","unstructured":"Mircea Cimpoi, Subhransu Maji, and Andrea Vedaldi. 2014. Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836 (2014)."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201378"},{"key":"e_1_2_2_12_1","volume-title":"Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data. arXiv preprint arXiv:2403.03309","author":"Eppel Sagi","year":"2024","unstructured":"Sagi Eppel, Jolina Li, Manuel Drehwald, and Alan Aspuru-Guzik. 2024. Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data. arXiv preprint arXiv:2403.03309 (2024)."},{"key":"e_1_2_2_13_1","volume-title":"SAMa: Material-aware 3D Selection and Segmentation. arXiv preprint arXiv:2411.19322","author":"Fischer Michael","year":"2024","unstructured":"Michael Fischer, Iliyan Georgiev, Thibault Groueix, Vladimir G Kim, Tobias Ritschel, and Valentin Deschaintre. 2024. SAMa: Material-aware 3D Selection and Segmentation. arXiv preprint arXiv:2411.19322 (2024)."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1167\/3.5.3"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV57658.2022.00042"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1973.4309314"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02034"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_2_2_19_1","volume-title":"Intrinsic Image Diffusion for Indoor Single-view Material Estimation. Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Kocsis Peter","year":"2024","unstructured":"Peter Kocsis, Vincent Sitzmann, and Matthias Niessner. 2024. Intrinsic Image Diffusion for Indoor Single-view Material Estimation. Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323036"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/636886.636891"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011126920638"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(02)00393-8"},{"key":"e_1_2_2_24_1","volume-title":"Computer Graphics Forum","author":"Morreale Luca","unstructured":"Luca Morreale, Noam Aigerman, Vladimir G Kim, and Niloy J Mitra. 2024. Neural semantic surface maps. In Computer Graphics Forum, Vol. 43. Wiley Online Library, e15005."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00418"},{"key":"e_1_2_2_26_1","volume-title":"2019 IEEE International Conference on Computer Vision (ICCV).","author":"Murmann Lukas","year":"2019","unstructured":"Lukas Murmann, Michael Gharbi, Miika Aittala, and Fredo Durand. 2019b. A Multi-Illumination Dataset of Indoor Object Appearance. In 2019 IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.2312\/sr.20211292"},{"key":"e_1_2_2_28_1","unstructured":"Maxime Oquab Timoth\u00e9e Darcet Th\u00e9o Moutakanni Huy Vo Marc Szafraniec Vasil Khalidov Pierre Fernandez Daniel Haziza Francisco Massa Alaaeldin El-Nouby et al. 2024. DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research Journal (2024) 1\u201331."},{"key":"e_1_2_2_29_1","unstructured":"Nikhila Ravi Valentin Gabeur Yuan-Ting Hu Ronghang Hu Chaitanya Ryali Tengyu Ma Haitham Khedr Roman R\u00e4dle Chloe Rolland Laura Gustafson et al. 2024. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)."},{"key":"e_1_2_2_30_1","volume-title":"International Conference on Machine Learning. PMLR, 29441\u201329454","author":"Ryali Chaitanya","year":"2023","unstructured":"Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, et al. 2023. Hiera: A hierarchical vision transformer without the bells-and-whistles. In International Conference on Machine Learning. PMLR, 29441\u201329454."},{"key":"e_1_2_2_31_1","volume-title":"The perception of material qualities in real-world images. Ph. D. Dissertation","author":"Sharan Lavanya","unstructured":"Lavanya Sharan. 2009. The perception of material qualities in real-world images. Ph. D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0609-0"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1167\/14.9.12"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592390"},{"key":"e_1_2_2_35_1","volume-title":"European Conference on Computer Vision. Springer, 444\u2013462","author":"Shi Baifeng","year":"2024","unstructured":"Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, and Trevor Darrell. 2024. When do we not need larger vision models?. In European Conference on Computer Vision. Springer, 444\u2013462."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.3390\/jimaging8070186"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2017.00067"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20074-8_26"},{"key":"e_1_2_2_39_1","volume-title":"Attention is all you need. Advances in Neural Information Processing Systems","author":"Vaswani A","year":"2017","unstructured":"A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017)."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02087"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00541"},{"key":"e_1_2_2_42_1","volume-title":"Proceedings, Part III 14","author":"Wang Ting-Chun","year":"2016","unstructured":"Ting-Chun Wang, Jun-Yan Zhu, Ebi Hiroaki, Manmohan Chandraker, Alexei A Efros, and Ravi Ramamoorthi. 2016. A 4D light-field dataset and CNN architectures for material recognition. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part III 14. Springer, 121\u2013138."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3025121"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657445"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00284"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763332","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:12:20Z","timestamp":1764969140000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763332"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":45,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1145\/3763332"],"URL":"https:\/\/doi.org\/10.1145\/3763332","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12]]},"assertion":[{"value":"2025-05-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}