{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T17:31:29Z","timestamp":1778693489850,"version":"3.51.4"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Crowd counting methods typically predict a density map as an intermediate representation of counting, and achieve good performance. However, due to the perspective phenomenon, there is a scale variation in real scenes, which causes the density map-based methods suffer from a severe scene generalization problem because only a limited number of scales are fitted in density map prediction and generation. To address this issue, we propose a novel vision transformer network, i.e., CrowdFormer, and a density kernels fusion framework for more accurate density map estimation and generation, respectively. Thereafter, we incorporate these two innovations into an adaptive learning system, which can take both the annotation dot map and original image as input, and jointly learns the density map estimator and generator within an end-to-end framework. The experimental results demonstrate that the proposed model achieves the state-of-the-art in the terms of MAE and MSE (e.g., it achieved a MAE of 67.1 and MSE of 301.6 on NWPU-Crowd dataset.), and confirm the effectiveness of the proposed two designs. The code is  https:\/\/github.com\/special-yang\/Top_Down-CrowdCounting.<\/jats:p>","DOI":"10.24963\/ijcai.2022\/215","type":"proceedings-article","created":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T02:55:56Z","timestamp":1657940156000},"page":"1545-1551","source":"Crossref","is-referenced-by-count":36,"title":["CrowdFormer: An Overlap Patching Vision Transformer for Top-Down Crowd Counting"],"prefix":"10.24963","author":[{"given":"Shaopeng","family":"Yang","sequence":"first","affiliation":[{"name":"Watrix Technology Co. LTD."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiyu","family":"Guo","sequence":"additional","affiliation":[{"name":"Information School, Central University of Finance and Economics, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuheng","family":"Ren","sequence":"additional","affiliation":[{"name":"Watrix Technology Co. LTD."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}","theme":"Artificial Intelligence","location":"Vienna, Austria","acronym":"IJCAI-2022","number":"31","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2022,7,23]]},"end":{"date-parts":[[2022,7,29]]}},"container-title":["Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T11:08:24Z","timestamp":1658142504000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2022\/215"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2022\/215","relation":{},"subject":[],"published":{"date-parts":[[2022,7]]}}}