{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:09:17Z","timestamp":1750219757468,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,10,9]]},"DOI":"10.1145\/3577190.3614176","type":"proceedings-article","created":{"date-parts":[[2023,10,7]],"date-time":"2023-10-07T22:30:48Z","timestamp":1696717848000},"page":"24-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["AIUnet: Asymptotic inference with U2-Net for referring image segmentation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4809-1864","authenticated-orcid":false,"given":"Jiangquan","family":"Li","sequence":"first","affiliation":[{"name":"SCHOOL OF SOFTWARE TECHNOLOGY, DALIAN UNIVERSITY OF TECHNOLOGY, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0864-7258","authenticated-orcid":false,"given":"Shimin","family":"Shan","sequence":"additional","affiliation":[{"name":"SCHOOL OF SOFTWARE TECHNOLOGY, DALIAN UNIVERSITY OF TECHNOLOGY, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8013-4372","authenticated-orcid":false,"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"SCHOOL OF SOFTWARE TECHNOLOGY, DALIAN UNIVERSITY OF TECHNOLOGY, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7388-7878","authenticated-orcid":false,"given":"Kaiping","family":"Xu","sequence":"additional","affiliation":[{"name":"SCHOOL OF SOFTWARE TECHNOLOGY, DALIAN UNIVERSITY OF TECHNOLOGY, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4053-5438","authenticated-orcid":false,"given":"Xiwen","family":"Hu","sequence":"additional","affiliation":[{"name":"SCHOOL OF SOFTWARE TECHNOLOGY, DALIAN UNIVERSITY OF TECHNOLOGY, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6614-8871","authenticated-orcid":false,"given":"Mingcheng","family":"Xue","sequence":"additional","affiliation":[{"name":"SCHOOL OF SOFTWARE TECHNOLOGY, DALIAN UNIVERSITY OF TECHNOLOGY, China"}]}],"member":"320","published-online":{"date-parts":[[2023,10,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quantitative imaging in medicine and surgery 10, 6","author":"Cai Sijing","year":"2020","unstructured":"Sijing Cai , Yunxian Tian , Harvey Lui , Haishan Zeng , Yi Wu , and Guannan Chen . 2020. Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quantitative imaging in medicine and surgery 10, 6 ( 2020 ), 1275. Sijing Cai, Yunxian Tian, Harvey Lui, Haishan Zeng, Yi Wu, and Guannan Chen. 2020. Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quantitative imaging in medicine and surgery 10, 6 (2020), 1275."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00755"},{"key":"e_1_3_2_1_3_1","volume-title":"Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306","author":"Chen Jieneng","year":"2021","unstructured":"Jieneng Chen , Yongyi Lu , Qihang Yu , Xiangde Luo , Ehsan Adeli , Yan Wang , Le Lu , Alan\u00a0 L Yuille , and Yuyin Zhou . 2021 . Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021). Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan\u00a0L Yuille, and Yuyin Zhou. 2021. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)."},{"key":"e_1_3_2_1_4_1","volume-title":"Referring expression object segmentation with caption-aware consistency. arXiv preprint arXiv:1910.04748","author":"Chen Yi-Wen","year":"2019","unstructured":"Yi-Wen Chen , Yi-Hsuan Tsai , Tiantian Wang , Yen-Yu Lin , and Ming-Hsuan Yang . 2019. Referring expression object segmentation with caption-aware consistency. arXiv preprint arXiv:1910.04748 ( 2019 ). Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, and Ming-Hsuan Yang. 2019. Referring expression object segmentation with caption-aware consistency. arXiv preprint arXiv:1910.04748 (2019)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01601"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00491"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01525"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_7"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00448"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01050"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58607-2_4"},{"key":"e_1_3_2_1_12_1","volume-title":"MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural networks 121","author":"Ibtehaz Nabil","year":"2020","unstructured":"Nabil Ibtehaz and M\u00a0Sohel Rahman . 2020. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural networks 121 ( 2020 ), 74\u201387. Nabil Ibtehaz and M\u00a0Sohel Rahman. 2020. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural networks 121 (2020), 74\u201387."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CBMS49503.2020.00111"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475222"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00973"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01761"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00602"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.143"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414006"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01005"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.9"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01252-6_39"},{"key":"e_1_3_2_1_24_1","volume-title":"V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV)","author":"Milletari Fausto","year":"2016","unstructured":"Fausto Milletari , Nassir Navab , and Seyed-Ahmad Ahmadi . 2016 . V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV) . IEEE , 565\u2013571. Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV). IEEE, 565\u2013571."},{"key":"e_1_3_2_1_25_1","volume-title":"Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999","author":"Oktay Ozan","year":"2018","unstructured":"Ozan Oktay , Jo Schlemper , Loic\u00a0Le Folgoc , Matthew Lee , Mattias Heinrich , Kazunari Misawa , Kensaku Mori , Steven McDonagh , Nils\u00a0 Y Hammerla , Bernhard Kainz , 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 ( 2018 ). Ozan Oktay, Jo Schlemper, Loic\u00a0Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils\u00a0Y Hammerla, Bernhard Kainz, 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)."},{"key":"e_1_3_2_1_26_1","volume-title":"U2-Net: Going deeper with nested U-structure for salient object detection. Pattern recognition 106","author":"Qin Xuebin","year":"2020","unstructured":"Xuebin Qin , Zichen Zhang , Chenyang Huang , Masood Dehghan , Osmar\u00a0 R Zaiane , and Martin Jagersand . 2020. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern recognition 106 ( 2020 ), 107404. Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar\u00a0R Zaiane, and Martin Jagersand. 2020. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern recognition 106 (2020), 107404."},{"key":"e_1_3_2_1_27_1","volume-title":"International Conference on Machine Learning. PMLR, 8748\u20138763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong\u00a0Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , 2021 . Learning transferable visual models from natural language supervision . In International Conference on Machine Learning. PMLR, 8748\u20138763 . Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_3"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01139"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ITME.2018.00080"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01111"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01762"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2971171"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01075"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00142"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_5"}],"event":{"name":"ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Paris France","acronym":"ICMI '23"},"container-title":["INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577190.3614176","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3577190.3614176","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:02Z","timestamp":1750178222000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577190.3614176"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,9]]},"references-count":37,"alternative-id":["10.1145\/3577190.3614176","10.1145\/3577190"],"URL":"https:\/\/doi.org\/10.1145\/3577190.3614176","relation":{},"subject":[],"published":{"date-parts":[[2023,10,9]]},"assertion":[{"value":"2023-10-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}