{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T05:14:17Z","timestamp":1780636457293,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":94,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key Program of Natural Science Project of Educational Commission of Anhui Province","award":["KJ2019A0034"],"award-info":[{"award-number":["KJ2019A0034"]}]},{"name":"Natural Science Foundation of Anhui Province","award":["1908085MF182"],"award-info":[{"award-number":["1908085MF182"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62006002"],"award-info":[{"award-number":["62006002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3475601","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T06:57:34Z","timestamp":1634540254000},"page":"4481-4490","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":149,"title":["TriTransNet"],"prefix":"10.1145","author":[{"given":"Zhengyi","family":"Liu","sequence":"first","affiliation":[{"name":"Anhui University, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Anhui University, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhengzheng","family":"Tu","sequence":"additional","affiliation":[{"name":"Anhui University, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yun","family":"Xiao","sequence":"additional","affiliation":[{"name":"Anhui University, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bin","family":"Tang","sequence":"additional","affiliation":[{"name":"Hefei University, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206596"},{"key":"e_1_3_2_1_2_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_3_2_1_3_1","volume-title":"Salient object detection: A benchmark","author":"Borji Ali","year":"2015","unstructured":"Ali Borji , Ming-Ming Cheng , Huaizu Jiang , and Jia Li. 2015. Salient object detection: A benchmark . IEEE transactions on image processing, Vol. 24 , 12 ( 2015 ), 5706--5722. Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing, Vol. 24, 12 (2015), 5706--5722."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3052069"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3014734"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00322"},{"key":"e_1_3_2_1_8_1","volume-title":"2021 b. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306","author":"Chen Jieneng","year":"2021","unstructured":"Jieneng Chen , Yongyi Lu , Qihang Yu , Xiangde Luo , Ehsan Adeli , Yan Wang , Le Lu , Alan L Yuille , and Yuyin Zhou . 2021 b. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 ( 2021 ). Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021 b. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)."},{"key":"e_1_3_2_1_9_1","volume-title":"2020 c. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recognition","author":"Chen Qian","year":"2020","unstructured":"Qian Chen , Keren Fu , Ze Liu , Geng Chen , Hongwei Du , Bensheng Qiu , and Ling Shao . 2020 c. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recognition ( 2020 ), 107740. Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, and Ling Shao. 2020 c. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recognition (2020), 107740."},{"key":"e_1_3_2_1_10_1","volume-title":"2021 a. RGB-D Salient Object Detection via 3D Convolutional Neural. AAAI","author":"Chen Qian","year":"2021","unstructured":"Qian Chen , Ze Liu , Yi Zhang , Keren Fu , Qijun Zhao , and Hongwei Du . 2021 a. RGB-D Salient Object Detection via 3D Convolutional Neural. AAAI ( 2021 ). Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. 2021 a. RGB-D Salient Object Detection via 3D Convolutional Neural. AAAI (2021)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58598-3_31"},{"key":"e_1_3_2_1_12_1","volume-title":"2021 e. Global-Local Propagation Network for RGB-D Semantic Segmentation. arXiv preprint arXiv:2101.10801","author":"Chen Sihan","year":"2021","unstructured":"Sihan Chen , Xinxin Zhu , Wei Liu , Xingjian He , and Jing Liu . 2021 e. Global-Local Propagation Network for RGB-D Semantic Segmentation. arXiv preprint arXiv:2101.10801 ( 2021 ). Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, and Jing Liu. 2021 e. Global-Local Propagation Network for RGB-D Semantic Segmentation. arXiv preprint arXiv:2101.10801 (2021)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"e_1_3_2_1_14_1","volume-title":"2020 a. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection","author":"Chen Zuyao","year":"2020","unstructured":"Zuyao Chen , Runmin Cong , Qianqian Xu , and Qingming Huang . 2020 a. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection . IEEE Transactions on Image Processing ( 2020 ). Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2020 a. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing (2020)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2632856.2632866"},{"key":"e_1_3_2_1_16_1","volume-title":"Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merri\u00ebnboer , Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014 . Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). Kyunghyun Cho, Bart Van Merri\u00ebnboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)."},{"key":"e_1_3_2_1_17_1","volume-title":"Do We Really Need Explicit Position Encodings for Vision Transformers? arXiv preprint arXiv:2102.10882","author":"Chu Xiangxiang","year":"2021","unstructured":"Xiangxiang Chu , Bo Zhang , Zhi Tian , Xiaolin Wei , and Huaxia Xia . 2021. Do We Really Need Explicit Position Encodings for Vision Transformers? arXiv preprint arXiv:2102.10882 ( 2021 ). Xiangxiang Chu, Bo Zhang, Zhi Tian, Xiaolin Wei, and Huaxia Xia. 2021. Do We Really Need Explicit Position Encodings for Vision Transformers? arXiv preprint arXiv:2102.10882 (2021)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459296"},{"key":"e_1_3_2_1_19_1","volume-title":"International Conference on Learning Representations.","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , and Neil Houlsby . 2021 . An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale . In International Conference on Learning Representations. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.487"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/3304415.3304515"},{"key":"e_1_3_2_1_22_1","volume-title":"Data Sets, and Large-Scale Benchmarks","author":"Fan Deng-Ping","year":"2020","unstructured":"Deng-Ping Fan , Zheng Lin , Zhao Zhang , Menglong Zhu , and Ming-Ming Cheng . 2020 a. Rethinking RGB-D Salient Object Detection: Models , Data Sets, and Large-Scale Benchmarks . IEEE Transactions on Neural Networks and Learning Systems ( 2020 ). Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020 a. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems (2020)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58610-2_17"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00312"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2389616"},{"key":"e_1_3_2_1_26_1","volume-title":"PCT: Point Cloud Transformer. arXiv preprint arXiv:2012.09688","author":"Guo Meng-Hao","year":"2020","unstructured":"Meng-Hao Guo , Jun-Xiong Cai , Zheng-Ning Liu , Tai-Jiang Mu , Ralph R Martin , and Shi-Min Hu . 2020 . PCT: Point Cloud Transformer. arXiv preprint arXiv:2012.09688 (2020). Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. 2020. PCT: Point Cloud Transformer. arXiv preprint arXiv:2012.09688 (2020)."},{"key":"e_1_3_2_1_27_1","unstructured":"Kai Han An Xiao Enhua Wu Jianyuan Guo Chunjing Xu and Yunhe Wang. 2021. Transformer in transformer. arXiv preprint arXiv:2103.00112(2021).  Kai Han An Xiao Enhua Wu Jianyuan Guo Chunjing Xu and Yunhe Wang. 2021. Transformer in transformer. arXiv preprint arXiv:2103.00112(2021)."},{"key":"e_1_3_2_1_28_1","volume-title":"Escaping the Big Data Paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704","author":"Hassani Ali","year":"2021","unstructured":"Ali Hassani , Steven Walton , Nikhil Shah , Abulikemu Abuduweili , Jiachen Li , and Humphrey Shi . 2021. Escaping the Big Data Paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704 ( 2021 ). Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, and Humphrey Shi. 2021. Escaping the Big Data Paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704 (2021)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045183"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3069297"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2012.11.008"},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings, Part XVIII 16","author":"Ji Wei","year":"2020","unstructured":"Wei Ji , Jingjing Li , Miao Zhang , Yongri Piao , and Huchuan Lu . 2020 . Accurate rgb-d salient object detection via collaborative learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part XVIII 16 . Springer, 52--69. Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. 2020. Accurate rgb-d salient object detection via collaborative learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVIII 16. Springer, 52--69."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2763321"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3060167"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2014.7025222"},{"key":"e_1_3_2_1_37_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_38_1","volume-title":"2020 a. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection","author":"Li Chongyi","year":"2020","unstructured":"Chongyi Li , Runmin Cong , Sam Kwong , Junhui Hou , Huazhu Fu , Guopu Zhu , Dingwen Zhang , and Qingming Huang . 2020 a. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection . IEEE Transactions on Cybernetics ( 2020 ). Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2020 a. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Transactions on Cybernetics (2020)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58598-3_14"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3062689"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2976689"},{"key":"e_1_3_2_1_42_1","volume-title":"2021 b. LocalViT: Bringing Locality to Vision Transformers. arXiv preprint arXiv:2104.05707","author":"Li Yawei","year":"2021","unstructured":"Yawei Li , Kai Zhang , Jiezhang Cao , Radu Timofte , and Luc Van Gool . 2021 b. LocalViT: Bringing Locality to Vision Transformers. arXiv preprint arXiv:2104.05707 ( 2021 ). Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, and Luc Van Gool. 2021 b. LocalViT: Bringing Locality to Vision Transformers. arXiv preprint arXiv:2104.05707 (2021)."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01377"},{"key":"e_1_3_2_1_44_1","volume-title":"Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030","author":"Liu Ze","year":"2021","unstructured":"Ze Liu , Yutong Lin , Yue Cao , Han Hu , Yixuan Wei , Zheng Zhang , Stephen Lin , and Baining Guo . 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 ( 2021 ). Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.07.012"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.01.045"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2694219"},{"key":"e_1_3_2_1_48_1","volume-title":"Robust Facial Expression Recognition with Convolutional Visual Transformers. arXiv preprint arXiv:2103.16854","author":"Ma Fuyan","year":"2021","unstructured":"Fuyan Ma , Bin Sun , and Shutao Li. 2021. Robust Facial Expression Recognition with Convolutional Visual Transformers. arXiv preprint arXiv:2103.16854 ( 2021 ). Fuyan Ma, Bin Sun, and Shutao Li. 2021. Robust Facial Expression Recognition with Convolutional Visual Transformers. arXiv preprint arXiv:2103.16854 (2021)."},{"key":"e_1_3_2_1_49_1","volume-title":"TrackFormer: Multi-Object Tracking with Transformers. arXiv preprint arXiv:2101.02702","author":"Meinhardt Tim","year":"2021","unstructured":"Tim Meinhardt , Alexander Kirillov , Laura Leal-Taixe , and Christoph Feichtenhofer . 2021. TrackFormer: Multi-Object Tracking with Transformers. arXiv preprint arXiv:2101.02702 ( 2021 ). Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, and Christoph Feichtenhofer. 2021. TrackFormer: Multi-Object Tracking with Transformers. arXiv preprint arXiv:2101.02702 (2021)."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354877"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/876866.877499"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2020.103964"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58595-2_15"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10578-9_7"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2355041"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00735"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00908"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_59_1","volume-title":"End-to-End Trainable Multi-Instance Pose Estimation with Transformers. arXiv preprint arXiv:2103.12115","author":"Stoffl Lucas","year":"2021","unstructured":"Lucas Stoffl , Maxime Vidal , and Alexander Mathis . 2021. End-to-End Trainable Multi-Instance Pose Estimation with Transformers. arXiv preprint arXiv:2103.12115 ( 2021 ). Lucas Stoffl, Maxime Vidal, and Alexander Mathis. 2021. End-to-End Trainable Multi-Instance Pose Estimation with Transformers. arXiv preprint arXiv:2103.12115 (2021)."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3007457"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00146"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_1_63_1","volume-title":"High-Fidelity Pluralistic Image Completion with Transformers. arXiv preprint arXiv:2103.14031","author":"Wan Ziyu","year":"2021","unstructured":"Ziyu Wan , Jingbo Zhang , Dongdong Chen , and Jing Liao . 2021. High-Fidelity Pluralistic Image Completion with Transformers. arXiv preprint arXiv:2103.14031 ( 2021 ). Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. 2021. High-Fidelity Pluralistic Image Completion with Transformers. arXiv preprint arXiv:2103.14031 (2021)."},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2913107"},{"key":"e_1_3_2_1_65_1","volume-title":"A deep network solution for attention and aesthetics aware photo cropping","author":"Wang Wenguan","year":"2018","unstructured":"Wenguan Wang , Jianbing Shen , and Haibin Ling . 2018. A deep network solution for attention and aesthetics aware photo cropping . IEEE transactions on pattern analysis and machine intelligence, Vol. 41 , 7 ( 2018 ), 1531--1544. Wenguan Wang, Jianbing Shen, and Haibin Ling. 2018. A deep network solution for attention and aesthetics aware photo cropping. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1531--1544."},{"key":"e_1_3_2_1_66_1","volume-title":"Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122","author":"Wang Wenhai","year":"2021","unstructured":"Wenhai Wang , Enze Xie , Xiang Li , Deng-Ping Fan , Kaitao Song , Ding Liang , Tong Lu , Ping Luo , and Ling Shao . 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 ( 2021 ). Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021)."},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3037470"},{"key":"e_1_3_2_1_68_1","volume-title":"Proceedings of the Asian Conference on Computer Vision. 1--17","author":"Wang Yue","year":"2020","unstructured":"Yue Wang , Yuke Li , James H Elder , Runmin Wu , Huchuan Lu , and Lu Zhang . 2020 b. Synergistic saliency and depth prediction for RGB-D saliency detection . In Proceedings of the Asian Conference on Computer Vision. 1--17 . Yue Wang, Yuke Li, James H Elder, Runmin Wu, Huchuan Lu, and Lu Zhang. 2020 b. Synergistic saliency and depth prediction for RGB-D saliency detection. In Proceedings of the Asian Conference on Computer Vision. 1--17."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6916"},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_2_1_71_1","volume-title":"2020 b. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677","author":"Wu Bichen","year":"2020","unstructured":"Bichen Wu , Chenfeng Xu , Xiaoliang Dai , Alvin Wan , Peizhao Zhang , Masayoshi Tomizuka , Kurt Keutzer , and Peter Vajda . 2020 b. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 ( 2020 ). Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, and Peter Vajda. 2020 b. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020)."},{"key":"e_1_3_2_1_72_1","volume-title":"CvT: Introducing Convolutions to Vision Transformers. arXiv preprint arXiv:2103.15808","author":"Wu Haiping","year":"2021","unstructured":"Haiping Wu , Bin Xiao , Noel Codella , Mengchen Liu , Xiyang Dai , Lu Yuan , and Lei Zhang . 2021. CvT: Introducing Convolutions to Vision Transformers. arXiv preprint arXiv:2103.15808 ( 2021 ). Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. CvT: Introducing Convolutions to Vision Transformers. arXiv preprint arXiv:2103.15808 (2021)."},{"key":"e_1_3_2_1_73_1","volume-title":"2020 a. MobileSal: Extremely Efficient RGB-D Salient Object Detection. arXiv preprint arXiv:2012.13095","author":"Wu Yu-Huan","year":"2020","unstructured":"Yu-Huan Wu , Yun Liu , Jun Xu , Jia-Wang Bian , Yuchao Gu , and Ming-Ming Cheng . 2020 a. MobileSal: Extremely Efficient RGB-D Salient Object Detection. arXiv preprint arXiv:2012.13095 ( 2020 ). Yu-Huan Wu, Yun Liu, Jun Xu, Jia-Wang Bian, Yuchao Gu, and Ming-Ming Cheng. 2020 a. MobileSal: Extremely Efficient RGB-D Salient Object Detection. arXiv preprint arXiv:2012.13095 (2020)."},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00403"},{"key":"e_1_3_2_1_75_1","volume-title":"CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2103.03024","author":"Xie Yutong","year":"2021","unstructured":"Yutong Xie , Jianpeng Zhang , Chunhua Shen , and Yong Xia . 2021. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2103.03024 ( 2021 ). Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. 2021. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2103.03024 (2021)."},{"key":"e_1_3_2_1_76_1","volume-title":"2021 a. Co-Scale Conv-Attentional Image Transformers. arXiv preprint arXiv:2104.06399","author":"Xu Weijian","year":"2021","unstructured":"Weijian Xu , Yifan Xu , Tyler Chang , and Zhuowen Tu . 2021 a. Co-Scale Conv-Attentional Image Transformers. arXiv preprint arXiv:2104.06399 ( 2021 ). Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. 2021 a. Co-Scale Conv-Attentional Image Transformers. arXiv preprint arXiv:2104.06399 (2021)."},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00424"},{"key":"e_1_3_2_1_78_1","volume-title":"Jiashi Feng, and Shuicheng Yan.","author":"Yuan Li","year":"2021","unstructured":"Li Yuan , Yunpeng Chen , Tao Wang , Weihao Yu , Yujun Shi , Francis EH Tay , Jiashi Feng, and Shuicheng Yan. 2021 . Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986 (2021). Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986 (2021)."},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00631"},{"key":"e_1_3_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00861"},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00353"},{"key":"e_1_3_2_1_82_1","volume-title":"2021 a. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. arXiv preprint arXiv:2103.15358","author":"Zhang Pengchuan","year":"2021","unstructured":"Pengchuan Zhang , Xiyang Dai , Jianwei Yang , Bin Xiao , Lu Yuan , Lei Zhang , and Jianfeng Gao . 2021 a. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. arXiv preprint arXiv:2103.15358 ( 2021 ). Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, and Jianfeng Gao. 2021 a. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. arXiv preprint arXiv:2103.15358 (2021)."},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.107130"},{"key":"e_1_3_2_1_84_1","volume-title":"2021 c. Transfuse: Fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005","author":"Zhang Yundong","year":"2021","unstructured":"Yundong Zhang , Huiye Liu , and Qiang Hu . 2021 c. Transfuse: Fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005 ( 2021 ). Yundong Zhang, Huiye Liu, and Qiang Hu. 2021 c. Transfuse: Fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005 (2021)."},{"key":"e_1_3_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3049959"},{"key":"e_1_3_2_1_86_1","volume-title":"2020 a. Point transformer. arXiv preprint arXiv:2012.09164","author":"Zhao Hengshuang","year":"2020","unstructured":"Hengshuang Zhao , Li Jiang , Jiaya Jia , Philip Torr , and Vladlen Koltun . 2020 a. Point transformer. arXiv preprint arXiv:2012.09164 ( 2020 ). Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. 2020 a. Point transformer. arXiv preprint arXiv:2012.09164 (2020)."},{"key":"e_1_3_2_1_87_1","volume-title":"Cees GM Snoek, and Joseph Tighe","author":"Zhao Jiaojiao","year":"2021","unstructured":"Jiaojiao Zhao , Xinyu Li , Chunhui Liu , Shuai Bing , Hao Chen , Cees GM Snoek, and Joseph Tighe . 2021 . TubeR: Tube- Transformer for Action Detection . arXiv preprint arXiv:2104.00969 (2021). Jiaojiao Zhao, Xinyu Li, Chunhui Liu, Shuai Bing, Hao Chen, Cees GM Snoek, and Joseph Tighe. 2021. TubeR: Tube-Transformer for Action Detection. arXiv preprint arXiv:2104.00969 (2021)."},{"key":"e_1_3_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413855"},{"key":"e_1_3_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00405"},{"key":"e_1_3_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58542-6_39"},{"key":"e_1_3_2_1_91_1","volume-title":"2021 a. TFill: Image Completion via a Transformer-Based Architecture. arXiv preprint arXiv:2104.00845","author":"Zheng Chuanxia","year":"2021","unstructured":"Chuanxia Zheng , Tat-Jen Cham , and Jianfei Cai . 2021 a. TFill: Image Completion via a Transformer-Based Architecture. arXiv preprint arXiv:2104.00845 ( 2021 ). Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2021 a. TFill: Image Completion via a Transformer-Based Architecture. arXiv preprint arXiv:2104.00845 (2021)."},{"key":"e_1_3_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"e_1_3_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00042"},{"key":"e_1_3_2_1_94_1","volume-title":"AAformer: Auto-Aligned Transformer for Person Re-Identification. arXiv preprint arXiv:2104.00921","author":"Zhu Kuan","year":"2021","unstructured":"Kuan Zhu , Haiyun Guo , Shiliang Zhang , Yaowei Wang , Gaopan Huang , Honglin Qiao , Jing Liu , Jinqiao Wang , and Ming Tang . 2021. AAformer: Auto-Aligned Transformer for Person Re-Identification. arXiv preprint arXiv:2104.00921 ( 2021 ). Kuan Zhu, Haiyun Guo, Shiliang Zhang, Yaowei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, and Ming Tang. 2021. AAformer: Auto-Aligned Transformer for Person Re-Identification. arXiv preprint arXiv:2104.00921 (2021)."}],"event":{"name":"MM '21: ACM Multimedia Conference","location":"Virtual Event China","acronym":"MM '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475601","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3475601","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:23Z","timestamp":1750193303000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475601"}},"subtitle":["RGB-D Salient Object Detection with a Triplet Transformer Embedding Network"],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":94,"alternative-id":["10.1145\/3474085.3475601","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3475601","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}