{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T04:28:38Z","timestamp":1729225718165,"version":"3.27.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643685489","type":"electronic"}],"license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,16]]},"abstract":"<jats:p>In computer vision, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, its quadratic complexity limits its applicability to tasks that benefit from high-resolution input. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to bidirectional data and two-dimensional image space. We scale Hyena\u2019s convolution kernels beyond the feature map size, up to 191\u00d7191, to maximize ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 84.9% and 85.2%, respectively, with no additional training data, while outperforming other convolutional and large-kernel networks. Combining HyenaPixel with attention further improves accuracy. We attribute the success of bidirectional Hyena to learning the data-dependent geometric arrangement of pixels without a fixed neighborhood definition. Experimental results on downstream tasks suggest that HyenaPixel with large filters and a fixed neighborhood leads to better localization performance.<\/jats:p>","DOI":"10.3233\/faia240529","type":"book-chapter","created":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T12:46:37Z","timestamp":1729169197000},"source":"Crossref","is-referenced-by-count":0,"title":["HyenaPixel: Global Image Context with Convolutions"],"prefix":"10.3233","author":[{"given":"Julian","family":"Spravil","sequence":"first","affiliation":[{"name":"Fraunhofer IAIS, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sebastian","family":"Houben","sequence":"additional","affiliation":[{"name":"University of Applied Sciences Bonn-Rhein-Sieg, Germany"},{"name":"Fraunhofer IAIS, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sven","family":"Behnke","sequence":"additional","affiliation":[{"name":"University of Bonn, Computer Science Institute VI, Center for Robotics, Germany"},{"name":"Lamarr Institute for Machine Learning and Artificial Intelligence, Germany"},{"name":"Fraunhofer IAIS, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2024"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA240529","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T12:46:38Z","timestamp":1729169198000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA240529"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,16]]},"ISBN":["9781643685489"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia240529","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,16]]}}}