{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:52:33Z","timestamp":1775069553784,"version":"3.50.1"},"reference-count":167,"publisher":"Association for Computing Machinery (ACM)","issue":"7","license":[{"start":{"date-parts":[[2024,4,9]],"date-time":"2024-04-09T00:00:00Z","timestamp":1712620800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020","award":["957337"],"award-info":[{"award-number":["957337"]}]},{"DOI":"10.13039\/501100004836","name":"Danish Council for Independent Research","doi-asserted-by":"crossref","award":["9131-00119B"],"award-info":[{"award-number":["9131-00119B"]}],"id":[{"id":"10.13039\/501100004836","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,7,31]]},"abstract":"<jats:p>Cameras in modern devices such as smartphones, satellites and medical equipment are capable of capturing very high resolution images and videos. Such high-resolution data often need to be processed by deep learning models for cancer detection, automated road navigation, weather prediction, surveillance, optimizing agricultural processes and many other applications. Using high-resolution images and videos as direct inputs for deep learning models creates many challenges due to their high number of parameters, computation cost, inference latency and GPU memory consumption. Simple approaches such as resizing the images to a lower resolution are common in the literature, however, they typically significantly decrease accuracy. Several works in the literature propose better alternatives in order to deal with the challenges of high-resolution data and improve accuracy and speed while complying with hardware limitations and time restrictions. This survey describes such efficient high-resolution deep learning methods, summarizes real-world applications of high-resolution deep learning, and provides comprehensive information about available high-resolution datasets.<\/jats:p>","DOI":"10.1145\/3645107","type":"journal-article","created":{"date-parts":[[2024,2,8]],"date-time":"2024-02-08T12:08:10Z","timestamp":1707394090000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":37,"title":["Efficient High-Resolution Deep Learning: A Survey"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8624-8661","authenticated-orcid":false,"given":"Arian","family":"Bakhtiarnia","sequence":"first","affiliation":[{"name":"DIGIT, Aarhus University, Aarhus, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5303-9804","authenticated-orcid":false,"given":"Qi","family":"Zhang","sequence":"additional","affiliation":[{"name":"DIGIT, Aarhus University, Aarhus, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4807-1345","authenticated-orcid":false,"given":"Alexandros","family":"Iosifidis","sequence":"additional","affiliation":[{"name":"DIGIT, Aarhus University, Aarhus, Denmark"}]}],"member":"320","published-online":{"date-parts":[[2024,4,9]]},"reference":[{"key":"e_1_3_3_2_2","volume-title":"IEEE Winter Conference on Applications of Computer Vision","author":"Aghaei Maya","year":"2021","unstructured":"Maya Aghaei, Matteo Bustreo, Yiming Wang, Gian Luca Bailo, Pietro Morerio, et\u00a0al. 2021. Single image human proxemics estimation for visual social distancing. In IEEE Winter Conference on Applications of Computer Vision."},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.scs.2020.102571"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00330-019-06170-3"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs12030458"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2019.05.010"},{"key":"e_1_3_3_7_2","article-title":"Dynamic split computing for efficient deep edge intelligence","author":"Bakhtiarnia Arian","year":"2022","unstructured":"Arian Bakhtiarnia, Nemanja Milo\u0161evi\u0107, Qi Zhang, Dragana Bajovi\u0107, and Alexandros Iosifidis. 2022. Dynamic split computing for efficient deep edge intelligence. arXiv preprint arXiv:2205.11269 (2022).","journal-title":"arXiv preprint arXiv:2205.11269"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1117\/1.JRS.11.042609"},{"key":"e_1_3_3_9_2","article-title":"SALISA: Saliency-based input sampling for efficient video object detection","author":"Bejnordi Babak Ehteshami","year":"2022","unstructured":"Babak Ehteshami Bejnordi, Amirhossein Habibian, Fatih Porikli, and Amir Ghodrati. 2022. SALISA: Saliency-based input sampling for efficient video object detection. arXiv preprint arXiv:2204.02397 (2022).","journal-title":"arXiv preprint arXiv:2204.02397"},{"key":"e_1_3_3_10_2","article-title":"Longformer: The long-document transformer","author":"Beltagy Iz","year":"2020","unstructured":"Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).","journal-title":"arXiv preprint arXiv:2004.05150"},{"key":"e_1_3_3_11_2","first-page":"76","volume-title":"Mobile Multimedia\/Image Processing, Security, and Applications","author":"Boehrer N.","year":"2020","unstructured":"N. Boehrer, A. Gabriel, A. Brandt, W. Uijens, L. Kampmeijer, et\u00a0al. 2020. Onboard ROI selection for aerial surveillance using a high resolution, high framerate camera. In Mobile Multimedia\/Image Processing, Security, and Applications, Vol. 11399. 76\u201395."},{"key":"e_1_3_3_12_2","first-page":"87","volume-title":"Computational Imaging XI","author":"Brady David J.","year":"2013","unstructured":"David J. Brady, Daniel L. Marks, Steven Feller, Michael Gehm, Dathon Golish, et\u00a0al. 2013. Petapixel photography and the limits of camera information capacity. In Computational Imaging XI. 87\u201393."},{"key":"e_1_3_3_13_2","article-title":"Copernicus global land service: Land cover 100m: collection 3: Epoch 2019: Globe","author":"Buchhorn Marcel","year":"2020","unstructured":"Marcel Buchhorn, Bruno Smets, Luc Bertels, Bert De Roo, Myroslava Lesiv, et\u00a0al. 2020. Copernicus global land service: Land cover 100m: collection 3: Epoch 2019: Globe. Version V3. 0.1) [Data set] (2020).","journal-title":"Version V3. 0.1) [Data set]"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00132"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_40"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-020-76282-0"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.12.049"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01567"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00398"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTARS.2020.3005403"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-25296-x"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2765695"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00646"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.350"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/EUROCON52738.2021.9535550"},{"key":"e_1_3_3_28_2","first-page":"16344","volume-title":"Advances in Neural Information Processing Systems","volume":"35","author":"Dao Tri","year":"2022","unstructured":"Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R\u00e9. 2022. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, et\u00a0al. (Eds.), Vol. 35. Curran Associates, Inc., 16344\u201316359. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/67d57c32e20fd0a7a302cb81d36e40d5-Paper-Conference.pdf"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00031"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.3389\/fmed.2019.00264"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2022.3168697"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00300"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","unstructured":"Piotr Dollar Christian Wojek Bernt Schiele and Pietro Perona. 2009. Caltech Pedestrians. DOI:10.1109\/CVPR.2009.5206631","DOI":"10.1109\/CVPR.2009.5206631"},{"key":"e_1_3_3_35_2","article-title":"Adversarial feature learning","author":"Donahue Jeff","year":"2016","unstructured":"Jeff Donahue, Philipp Kr\u00e4henb\u00fchl, and Trevor Darrell. 2016. Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016).","journal-title":"arXiv preprint arXiv:1605.09782"},{"key":"e_1_3_3_36_2","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1007\/978-3-030-00889-5_36","volume-title":"Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support","author":"Dong Nanqing","year":"2018","unstructured":"Nanqing Dong, Michael Kampffmeyer, Xiaodan Liang, Zeya Wang, Wei Dai, et\u00a0al. 2018. Reinforced auto-zoom net: Towards accurate and fast breast cancer segmentation in whole-slide images. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. 317\u2013325."},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01181"},{"key":"e_1_3_3_38_2","volume-title":"International Conference on Learning Representations","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, et\u00a0al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"e_1_3_3_39_2","article-title":"Zoom in to where it matters: A hierarchical graph based model for mammogram analysis","author":"Du Hao","year":"2019","unstructured":"Hao Du, Jiashi Feng, and Mengling Feng. 2019. Zoom in to where it matters: A hierarchical graph based model for mammogram analysis. arXiv preprint arXiv:1912.07517 (2019).","journal-title":"arXiv preprint arXiv:1912.07517"},{"issue":"7","key":"e_1_3_3_40_2","first-page":"1665","article-title":"Model parallelism optimization for distributed inference via decoupled CNN structure","volume":"32","author":"Du Jiangsu","year":"2020","unstructured":"Jiangsu Du, Xin Zhu, Minghua Shen, Yunfei Du, Yutong Lu, et\u00a0al. 2020. Model parallelism optimization for distributed inference via decoupled CNN structure. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2020), 1665\u20131676.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_3_41_2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1007\/BFb0086566","volume-title":"Constructive Theory of Functions of Several Variables","author":"Duchon Jean","year":"1977","unstructured":"Jean Duchon. 1977. Splines minimizing rotation-invariant semi-norms in Sobolev spaces. In Constructive Theory of Functions of Several Variables. 85\u2013100."},{"issue":"23","key":"e_1_3_3_42_2","first-page":"4321","article-title":"Whole slide imaging in pathology: Advantages, limitations, and emerging perspectives","volume":"7","author":"Farahani Navid","year":"2015","unstructured":"Navid Farahani, Anil V. Parwani, Liron Pantanowitz, et\u00a0al. 2015. Whole slide imaging in pathology: Advantages, limitations, and emerging perspectives. Pathology and Laboratory Medicine International 7, 23-33 (2015), 4321.","journal-title":"Pathology and Laboratory Medicine International"},{"key":"e_1_3_3_43_2","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1007\/978-3-030-64559-5_30","volume-title":"Advances in Visual Computing","author":"Ferreira Cristiane B. R.","year":"2020","unstructured":"Cristiane B. R. Ferreira, Helio Pedrini, Wanderley de Souza Alencar, William D. Ferreira, Thyago Peres Carvalho, et\u00a0al. 2020. Where\u2019s Wally: A gigapixel image study for face recognition in crowds. In Advances in Visual Computing. 386\u2013397."},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejrad.2022.110294"},{"key":"e_1_3_3_45_2","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/978-3-030-23937-4_2","volume-title":"European Congress on Digital Pathology","author":"Gamper Jevgenij","year":"2019","unstructured":"Jevgenij Gamper, Navid Alemi Koohbanani, Ksenija Benet, Ali Khuram, and Nasir Rajpoot. 2019. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification. In European Congress on Digital Pathology. 11\u201319."},{"key":"e_1_3_3_46_2","article-title":"CNN-based density estimation and crowd counting: A survey","author":"Gao Guangshuai","year":"2020","unstructured":"Guangshuai Gao, Junyu Gao, Qingjie Liu, Qi Wang, and Yunhong Wang. 2020. CNN-based density estimation and crowd counting: A survey. arXiv preprint arXiv:2003.12783 (2020).","journal-title":"arXiv preprint arXiv:2003.12783"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00724"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354978"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2019.101563"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01178"},{"key":"e_1_3_3_51_2","article-title":"Faster neural networks straight from JPEG","volume":"31","author":"Gueguen Lionel","year":"2018","unstructured":"Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. 2018. Faster neural networks straight from JPEG. Advances in Neural Information Processing Systems 31 (2018).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41095-021-0240-x"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs11171976"},{"key":"e_1_3_3_54_2","article-title":"Dynamic neural networks: A survey","author":"Han Yizeng","year":"2021","unstructured":"Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, et\u00a0al. 2021. Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906 (2021).","journal-title":"arXiv preprint arXiv:2102.04906"},{"key":"e_1_3_3_55_2","article-title":"Pruning self-attentions into convolutional layers in single path","author":"He Haoyu","year":"2021","unstructured":"Haoyu He, Jing Liu, Zizheng Pan, Jianfei Cai, Jing Zhang, et\u00a0al. 2021. Pruning self-attentions into convolutional layers in single path. arXiv preprint arXiv:2111.11802 (2021).","journal-title":"arXiv preprint arXiv:2111.11802"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_57_2","article-title":"Continual inference: A library for efficient online inference with deep neural networks in PyTorch","author":"Hedegaard Lukas","year":"2022","unstructured":"Lukas Hedegaard and Alexandros Iosifidis. 2022. Continual inference: A library for efficient online inference with deep neural networks in PyTorch. arXiv preprint: arXiv:2204.03418 (2022).","journal-title":"arXiv preprint: arXiv:2204.03418"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compmedimag.2021.101866"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41524-020-00363-x"},{"key":"e_1_3_3_61_2","article-title":"Ambient sound helps: Audiovisual crowd counting in extreme conditions","author":"Hu Di","year":"2020","unstructured":"Di Hu, Lichao Mou, Qingzhong Wang, Junyu Gao, Yuansheng Hua, et\u00a0al. 2020. Ambient sound helps: Audiovisual crowd counting in extreme conditions. arXiv preprint arXiv:2005.07097 (2020).","journal-title":"arXiv preprint arXiv:2005.07097"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2926463"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.329"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_33"},{"key":"e_1_3_3_65_2","article-title":"Spatial transformer networks","volume":"28","author":"Jaderberg Max","year":"2015","unstructured":"Max Jaderberg, Karen Simonyan, Andrew Zisserman, et\u00a0al. 2015. Spatial transformer networks. Advances in Neural Information Processing Systems 28 (2015).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00045"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs14071552"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3483274"},{"key":"e_1_3_3_69_2","article-title":"Learning to downsample for segmentation of ultra-high resolution images","author":"Jin Chen","year":"2021","unstructured":"Chen Jin, Ryutaro Tanno, Thomy Mertzanidou, Eleftheria Panagiotaki, and Daniel C. Alexander. 2021. Learning to downsample for segmentation of ultra-high resolution images. arXiv preprint arXiv:2109.11071 (2021).","journal-title":"arXiv preprint arXiv:2109.11071"},{"issue":"2","key":"e_1_3_3_70_2","first-page":"9","article-title":"Comparative analysis between DCT & DWT techniques of image compression","volume":"1","author":"Katharotiya Anilkumar","year":"2011","unstructured":"Anilkumar Katharotiya, Swati Patel, and Mahesh Goyani. 2011. Comparative analysis between DCT & DWT techniques of image compression. Journal of Information Engineering and Applications 1, 2 (2011), 9\u201317.","journal-title":"Journal of Information Engineering and Applications"},{"key":"e_1_3_3_71_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btaa1094"},{"key":"e_1_3_3_72_2","volume-title":"International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations."},{"key":"e_1_3_3_73_2","article-title":"Auto-encoding variational Bayes","author":"Kingma Diederik P.","year":"2013","unstructured":"Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013).","journal-title":"arXiv preprint arXiv:1312.6114"},{"key":"e_1_3_3_74_2","volume-title":"International Conference on Learning Representations","author":"Kipf Thomas N.","year":"2017","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations."},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature11412"},{"issue":"6","key":"e_1_3_3_76_2","first-page":"E477\u2013E483","article-title":"KID project: An internet-based digital video atlas of capsule endoscopy for research purposes","volume":"5","author":"Koulaouzidis Anastasios","year":"2017","unstructured":"Anastasios Koulaouzidis, Dimitris Iakovidis, Diana Yung, Emanuele Rondonotti, Uri Kopylov, et\u00a0al. 2017. KID project: An internet-based digital video atlas of capsule endoscopy for research purposes. Endoscopy International Open 5, 6 (2017), E477\u2013E483.","journal-title":"Endoscopy International Open"},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_3_3_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW54120.2021.00072"},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"issue":"3","key":"e_1_3_3_80_2","first-page":"E415\u2013E420","article-title":"CAD-CAP: A 25,000-image database serving the development of artificial intelligence for capsule endoscopy","volume":"8","author":"Leenhardt Romain","year":"2020","unstructured":"Romain Leenhardt, Cynthia Li, Jean-Philippe Le Mouel, Gabriel Rahmi, Jean Christophe Saurin, et\u00a0al. 2020. CAD-CAP: A 25,000-image database serving the development of artificial intelligence for capsule endoscopy. Endosc. Int. Open 8, 3 (2020), E415\u2013E420.","journal-title":"Endosc. Int. Open"},{"key":"e_1_3_3_81_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.10.006"},{"key":"e_1_3_3_82_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_28"},{"key":"e_1_3_3_83_2","article-title":"Q-ViT: Fully differentiable quantization for vision transformer","author":"Li Zhexin","year":"2022","unstructured":"Zhexin Li, Tong Yang, Peisong Wang, and Jian Cheng. 2022. Q-ViT: Fully differentiable quantization for vision transformer. arXiv preprint arXiv:2201.07703 (2022).","journal-title":"arXiv preprint arXiv:2201.07703"},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2019.2891305"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00865"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00131"},{"key":"e_1_3_3_88_2","article-title":"Detecting cancer metastases on gigapixel pathology images","author":"Liu Yun","year":"2017","unstructured":"Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, et\u00a0al. 2017. Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442 (2017).","journal-title":"arXiv preprint arXiv:1703.02442"},{"key":"e_1_3_3_89_2","volume-title":"International Conference on Learning Representations","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations."},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00138"},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.1109\/MFI52462.2021.9591201"},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.agrformet.2018.10.013"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00222"},{"key":"e_1_3_3_94_2","volume-title":"IEEE International Workshop on Benchmarking Facial Image Analysis Technologies","author":"Wohlhart Peter M. Roth Martin Koestinger, Paul","year":"2011","unstructured":"Peter M. Roth Martin Koestinger, Paul Wohlhart and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE International Workshop on Benchmarking Facial Image Analysis Technologies."},{"key":"e_1_3_3_95_2","article-title":"Split computing and early exiting for deep learning applications: Survey and research challenges","author":"Matsubara Yoshitomo","year":"2021","unstructured":"Yoshitomo Matsubara, Marco Levorato, and Francesco Restuccia. 2021. Split computing and early exiting for deep learning applications: Survey and research challenges. Comput. Surveys (2021).","journal-title":"Comput. Surveys"},{"key":"e_1_3_3_96_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.crad.2021.01.010"},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01249-6_34"},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.acra.2011.09.014"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.119"},{"key":"e_1_3_3_100_2","article-title":"Modern hierarchical, agglomerative clustering algorithms","author":"M\u00fcllner Daniel","year":"2011","unstructured":"Daniel M\u00fcllner. 2011. Modern hierarchical, agglomerative clustering algorithms. arXiv preprint arXiv:1109.2378 (2011).","journal-title":"arXiv preprint arXiv:1109.2378"},{"key":"e_1_3_3_101_2","article-title":"Faster causal attention over large sequences through sparse flash attention","author":"Pagliardini Matteo","year":"2023","unstructured":"Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, and Fran\u00e7ois Fleuret. 2023. Faster causal attention over large sequences through sparse flash attention. arXiv preprint arXiv:2306.01160 (2023).","journal-title":"arXiv preprint arXiv:2306.01160"},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3040591"},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01240-3_4"},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_3_105_2","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_106_2","volume-title":"H. 264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia","author":"Richardson Iain E.","year":"2004","unstructured":"Iain E. Richardson. 2004. H. 264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. John Wiley & Sons."},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/eScience.2018.00130"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.352"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.2978717"},{"key":"e_1_3_3_111_2","volume-title":"Fine International Conference on Gigapixel Imaging for Science","author":"Sargent Randy","year":"2010","unstructured":"Randy Sargent, Chris Bartley, Paul Dille, Jeff Keller, Illah Nourbakhsh, et\u00a0al. 2010. Timelapse GigaPan: Capturing, sharing, and exploring timelapse gigapixel imagery. In Fine International Conference on Gigapixel Imaging for Science."},{"key":"e_1_3_3_112_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcct.2022.02.003"},{"key":"e_1_3_3_113_2","article-title":"OverFeat: Integrated recognition, localization and detection using convolutional networks","author":"Sermanet Pierre","year":"2013","unstructured":"Pierre Sermanet, David Eigen, Xiang Zhang, Micha\u00ebl Mathieu, Rob Fergus, et\u00a0al. 2013. OverFeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013).","journal-title":"arXiv preprint arXiv:1312.6229"},{"key":"e_1_3_3_114_2","article-title":"Megatron-LM: Training multi-billion parameter language models using model parallelism","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, et\u00a0al. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019).","journal-title":"arXiv preprint arXiv:1909.08053"},{"key":"e_1_3_3_115_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).","journal-title":"arXiv preprint arXiv:1409.1556"},{"key":"e_1_3_3_116_2","article-title":"JHU-CROWD++: Large-scale crowd counting dataset and a benchmark method","author":"Sindagi Vishwanath A.","year":"2020","unstructured":"Vishwanath A. Sindagi, Rajeev Yasarla, and Vishal M. Patel. 2020. JHU-CROWD++: Large-scale crowd counting dataset and a benchmark method. Technical Report (2020).","journal-title":"Technical Report"},{"key":"e_1_3_3_117_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16360"},{"key":"e_1_3_3_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACVW54805.2022.00063"},{"key":"e_1_3_3_119_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2020.101813"},{"key":"e_1_3_3_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00252"},{"key":"e_1_3_3_121_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2021.12.004"},{"key":"e_1_3_3_122_2","first-page":"6105","volume-title":"International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. 6105\u20136114."},{"key":"e_1_3_3_123_2","article-title":"Efficient transformers: A survey","author":"Tay Yi","year":"2020","unstructured":"Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey. arXiv preprint arXiv:2009.06732 (2020).","journal-title":"arXiv preprint arXiv:2009.06732"},{"key":"e_1_3_3_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2936841"},{"key":"e_1_3_3_125_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01525"},{"key":"e_1_3_3_126_2","doi-asserted-by":"publisher","DOI":"10.1145\/2812802"},{"key":"e_1_3_3_127_2","article-title":"Gigapixel behavioral and neural activity imaging with a novel multi-camera array microscope","author":"Thomson Eric","year":"2021","unstructured":"Eric Thomson, Mark Harfouche, Pavan Konda, Catherine W. Seitz, Kanghyun Kim, et\u00a0al. 2021. Gigapixel behavioral and neural activity imaging with a novel multi-camera array microscope. bioRxiv (2021).","journal-title":"bioRxiv"},{"key":"e_1_3_3_128_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.bdr.2017.06.002"},{"key":"e_1_3_3_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM47256.2019.8983226"},{"key":"e_1_3_3_130_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11063-019-10156-z"},{"key":"e_1_3_3_131_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107407"},{"key":"e_1_3_3_132_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2019.2897815"},{"key":"e_1_3_3_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01236"},{"key":"e_1_3_3_134_2","first-page":"1873","volume-title":"IEEE International Geoscience and Remote Sensing Symposium","author":"Vakalopoulou M.","year":"2015","unstructured":"M. Vakalopoulou, K. Karantzalos, N. Komodakis, and N. Paragios. 2015. Building detection in very high resolution multispectral data with deep learning features. In IEEE International Geoscience and Remote Sensing Symposium. 1873\u20131876."},{"key":"e_1_3_3_135_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-021-01343-4"},{"key":"e_1_3_3_136_2","article-title":"You only look twice: Rapid multi-scale object detection in satellite imagery","author":"Etten Adam Van","year":"2018","unstructured":"Adam Van Etten. 2018. You only look twice: Rapid multi-scale object detection in satellite imagery. arXiv preprint arXiv:1805.09512 (2018).","journal-title":"arXiv preprint arXiv:1805.09512"},{"key":"e_1_3_3_137_2","volume-title":"International Conference on Learning Representations","author":"Velickovic Petar","year":"2018","unstructured":"Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, et\u00a0al. 2018. Graph attention networks. In International Conference on Learning Representations."},{"key":"e_1_3_3_138_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2019.02.012"},{"key":"e_1_3_3_139_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2983686"},{"key":"e_1_3_3_140_2","first-page":"1396","volume-title":"IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Wang Kun","year":"2019","unstructured":"Kun Wang, Xiaohong Zhang, and Sheng Huang. 2019. KGZNet: Knowledge-guided deep zoom neural networks for thoracic disease classification. In IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 1396\u20131401."},{"key":"e_1_3_3_141_2","doi-asserted-by":"publisher","DOI":"10.1136\/bjophthalmol-2018-313706"},{"key":"e_1_3_3_142_2","doi-asserted-by":"publisher","DOI":"10.1049\/ipr2.12466"},{"key":"e_1_3_3_143_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3013269"},{"key":"e_1_3_3_144_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00720"},{"key":"e_1_3_3_145_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00333"},{"key":"e_1_3_3_146_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2020.102907"},{"key":"e_1_3_3_147_2","unstructured":"Lilian Weng and Greg Brockman. 2022. Techniques for Training Large Neural Networks. https:\/\/openai.com\/blog\/techniques-for-training-large-neural-networks\/"},{"key":"e_1_3_3_148_2","first-page":"843","volume-title":"Conference on Medical Imaging with Deep Learning","author":"Xie Chensu","year":"2020","unstructured":"Chensu Xie, Hassan Muhammad, Chad M. Vanderbilt, Raul Caso, Dig Vijay Kumar Yarlagadda, et\u00a0al. 2020. Beyond classification: Whole slide tissue histopathology analysis by end-to-end part learning. In Conference on Medical Imaging with Deep Learning. 843\u2013856."},{"key":"e_1_3_3_149_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2020.3010102"},{"key":"e_1_3_3_150_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00847"},{"key":"e_1_3_3_151_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00181"},{"key":"e_1_3_3_152_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs10091461"},{"key":"e_1_3_3_153_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00496"},{"key":"e_1_3_3_154_2","doi-asserted-by":"publisher","DOI":"10.21037\/atm.2020.03.132"},{"key":"e_1_3_3_155_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.596"},{"key":"e_1_3_3_156_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"e_1_3_3_157_2","first-page":"12992","volume-title":"Advances in Neural Information Processing Systems","author":"Yu Qihang","year":"2021","unstructured":"Qihang Yu, Yingda Xia, Yutong Bai, Yongyi Lu, Alan L. Yuille, et\u00a0al. 2021. Glance-and-gaze vision transformer. In Advances in Neural Information Processing Systems, Vol. 34. 12992\u201313003."},{"key":"e_1_3_3_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCPHOT.2017.7951481"},{"key":"e_1_3_3_159_2","first-page":"7281","volume-title":"Advances in Neural Information Processing Systems","author":"Yuan Yuhui","year":"2021","unstructured":"Yuhui Yuan, Rao Fu, Lang Huang, Weihong Lin, Chao Zhang, et\u00a0al. 2021. HRFormer: High-resolution vision transformer for dense predict. In Advances in Neural Information Processing Systems, Vol. 34. 7281\u20137293."},{"key":"e_1_3_3_160_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01183"},{"key":"e_1_3_3_161_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00299"},{"key":"e_1_3_3_162_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs12030417"},{"key":"e_1_3_3_163_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.70"},{"key":"e_1_3_3_164_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_7"},{"key":"e_1_3_3_165_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_25"},{"key":"e_1_3_3_166_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2021.3070383"},{"key":"e_1_3_3_167_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.544"},{"issue":"7","key":"e_1_3_3_168_2","first-page":"3602","article-title":"Locality-aware crowd counting","volume":"44","author":"Zhou Joey Tianyi","year":"2021","unstructured":"Joey Tianyi Zhou, Le Zhang, Du Jiawei, Xi Peng, Zhiwen Fang, et\u00a0al. 2021. Locality-aware crowd counting. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2021), 3602\u20133613.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3645107","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3645107","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:27Z","timestamp":1750291407000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3645107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,9]]},"references-count":167,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7,31]]}},"alternative-id":["10.1145\/3645107"],"URL":"https:\/\/doi.org\/10.1145\/3645107","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,9]]},"assertion":[{"value":"2022-12-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-31","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}