{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T07:19:08Z","timestamp":1777706348639,"version":"3.51.4"},"reference-count":51,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2022,8,10]]},"abstract":"<jats:p>The location attention mechanism has been widely applied in deep neural networks. However, as the mechanism entails heavy computing workload, significant memories consumed for weights storage, and shows poor parallelism in some calculations, it is hard to achieve high efficiency deployment. In this paper, the field-programmable gate array (FPGA) is employed to implement the location attention mechanism in hardware, and a novel fusion approach is proposed to connect the convolutional layer with the fully connected layer, which not only improves the parallelism of both the algorithm and the hardware pipeline, but also reduces the computation cost for such operations as multiplication and addition. Meanwhile, the shared computing architecture is used to reduce the demand for hardware resources. Parallel computing arrays are utilized to time-multiplex a single computing array, which can speed up the pipeline parallel computing of the attention mechanism. Experimental results show that for the location attention mechanism, the FPGA\u2019s inference speed is 0.010\u200ams, which is around a quarter of the speed achieved by running it with GPU, and its power consumption is 1.73\u200aW, which is about 2.89% of the power consumed by running it with CPU. Compared with other FPGA implementation methods of attention mechanism, it has less hardware resource consumption and less inference time. When applied to speech recognition tasks, the trained attention model is symmetrically quantized and deployed on the FPGA. The result shows that the word error rate is only 0.79% higher than that before quantization, which proves the effectiveness and correctness of the hardware circuit.<\/jats:p>","DOI":"10.3233\/jifs-212273","type":"journal-article","created":{"date-parts":[[2022,3,1]],"date-time":"2022-03-01T13:23:58Z","timestamp":1646141038000},"page":"5309-5323","source":"Crossref","is-referenced-by-count":1,"title":["FPGA-based design and implementation of the location attention mechanism in neural networks"],"prefix":"10.1177","volume":"43","author":[{"given":"Ruixiu","family":"Qiao","sequence":"first","affiliation":[{"name":"Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Xiaozhou","family":"Guo","sequence":"additional","affiliation":[{"name":"Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Wenyu","family":"Mao","sequence":"additional","affiliation":[{"name":"Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Jixing","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Huaxiang","family":"Lu","sequence":"additional","affiliation":[{"name":"Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"},{"name":"CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China"},{"name":"Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab, Beijing, China"}]}],"member":"179","reference":[{"key":"10.3233\/JIFS-212273_ref1","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks[J]","volume":"28","author":"Ren","year":"2015","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/JIFS-212273_ref2","unstructured":"Devlin J. , Chang M.W. , Lee K. , et al., Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018."},{"key":"10.3233\/JIFS-212273_ref3","first-page":"4960","article-title":"Listen, attend and spell: A neural network for large vocabulary conversational speech recognition[C]\/\/IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Chan","year":"2016","journal-title":"IEEE"},{"key":"10.3233\/JIFS-212273_ref4","doi-asserted-by":"crossref","unstructured":"Cho K. , Van Merri\u00ebnboer B. , GulcehreC., et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv:1406.1078, 2014.","DOI":"10.3115\/v1\/D14-1179"},{"key":"10.3233\/JIFS-212273_ref5","doi-asserted-by":"crossref","unstructured":"Manaswi N.K. , Deep learning with applications using python: chatbots and face, object, and speech recognition with tensorflow and keras[M], Apress, 2018.","DOI":"10.1007\/978-1-4842-3516-4_7"},{"key":"10.3233\/JIFS-212273_ref6","first-page":"806","article-title":"CNN features off-the-shelf: an astounding baseline for recognition[C]\/\/","author":"Sharif Razavian","year":"2014","journal-title":"Proceedings of the IEEE conference on computer vision and pattern recognition workshops"},{"issue":"10","key":"10.3233\/JIFS-212273_ref7","doi-asserted-by":"crossref","first-page":"2222","DOI":"10.1109\/TNNLS.2016.2582924","article-title":"LSTM: A search space odyssey[J]","volume":"28","author":"Greff","year":"2016","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"10.3233\/JIFS-212273_ref8","unstructured":"Chung J. , Gulcehre C. , Cho K.H. , et al., Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint arXiv:1412.3555, 2014."},{"key":"10.3233\/JIFS-212273_ref9","first-page":"5998","article-title":"Attention is all you need[C]\/\/","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/JIFS-212273_ref10","doi-asserted-by":"crossref","unstructured":"Luong M.T. , Pham H. and Manning C.D. , Effective approaches to attention-based neural machine translation[J], arXiv preprint arXiv:1508.04025, 2015.","DOI":"10.18653\/v1\/D15-1166"},{"key":"10.3233\/JIFS-212273_ref11","unstructured":"Golub D. and He X. , Character-level question answering with attention[J]. arXiv preprint arXiv:1604.00727, 2016."},{"key":"10.3233\/JIFS-212273_ref12","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/j.future.2020.08.005","article-title":"ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis[J]","volume":"115","author":"Basiri","year":"2021","journal-title":"Future Generation Computer Systems"},{"key":"10.3233\/JIFS-212273_ref13","first-page":"2073","article-title":"Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams[C]\/\/","author":"Tian","year":"2020","journal-title":"Proceedings of the 28th International Conference on Computational Linguistics"},{"issue":"1","key":"10.3233\/JIFS-212273_ref14","first-page":"75","article-title":"Nonintrusive load monitoring based on sequence-to-sequence model with attention mechanism[C]\/\/","volume":"39","author":"Wang","year":"2019","journal-title":"Zhongguo Dianji Gongcheng Xuebao\/Proceedings of the Chinese Society of Electrical Engineering"},{"key":"10.3233\/JIFS-212273_ref15","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1109\/ISSPIT.2018.8642767","article-title":"SAM-GCNN: A gated convolutional neural network with segment-level attention mechanism for home activity monitoring[C]\/\/","author":"Shen","year":"2018","journal-title":"2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE"},{"key":"10.3233\/JIFS-212273_ref16","first-page":"432","article-title":"An introductory survey on attention mechanisms in NLP problems[C]\/\/","author":"Hu","year":"2019","journal-title":"Proceedings of SAI Intelligent Systems Conference. Springer, Cham"},{"key":"10.3233\/JIFS-212273_ref17","unstructured":"Chaudhari S. , Mithal V. , Polatkan G. , et al., An attentive survey of attention models[J], arXiv preprint arXiv:1904.02874, 2019."},{"key":"10.3233\/JIFS-212273_ref18","first-page":"3519","article-title":"LSTM, GRU, highway and a bit of attention: an empirical overview for language modeling in speech recognition[C]\/\/","author":"Irie","year":"2016","journal-title":"Interspeech"},{"key":"10.3233\/JIFS-212273_ref19","first-page":"2204","article-title":"Recurrent models of visual attention[C]\/\/","author":"Mnih","year":"2014","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/JIFS-212273_ref20","first-page":"842","article-title":"The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]\/\/","author":"Xiao","year":"2015","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern recognition"},{"key":"10.3233\/JIFS-212273_ref21","first-page":"2017","article-title":"Spatial transformer networks[J]","volume":"28","author":"Jaderberg","year":"2015","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/JIFS-212273_ref22","first-page":"7132","article-title":"Squeeze-and-excitation networks[C]\/\/","author":"Hu","year":"2018","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"10.3233\/JIFS-212273_ref23","unstructured":"Li H. , Xiong P. , An J. , et al., Pyramid attention network for semantic segmentation[J], arXiv preprint arXiv:1805.10180, 2018."},{"key":"10.3233\/JIFS-212273_ref24","first-page":"3","article-title":"Cbam: Convolutional block attention module[C]\/\/","author":"Woo","year":"2018","journal-title":"Proceedings of the European conference on computer vision (ECCV)"},{"key":"10.3233\/JIFS-212273_ref25","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/j.procs.2021.02.068","article-title":"Research and application of semantic understanding based on Attention-RNN[J]","volume":"183","author":"Du","year":"2021","journal-title":"Procedia Computer Science"},{"key":"10.3233\/JIFS-212273_ref26","unstructured":"Chorowski J. , Bahdanau D. , Serdyuk D. , et al., Attention-based models for speech recognition[J], arXiv preprint arXiv:1506.07503, 2015."},{"key":"10.3233\/JIFS-212273_ref27","doi-asserted-by":"crossref","first-page":"4835","DOI":"10.1109\/ICASSP.2017.7953075","article-title":"Joint CTC-attention based end-to-end speech recognition using multi-task learning[C]\/\/","author":"Kim","year":"2017","journal-title":"2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE"},{"key":"10.3233\/JIFS-212273_ref28","doi-asserted-by":"crossref","unstructured":"Inaguma H. and Kawahara T. , Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition[J], arXiv preprint arXiv:2103.00422, 2021.","DOI":"10.1109\/TASLP.2021.3133217"},{"issue":"07","key":"10.3233\/JIFS-212273_ref29","first-page":"9","article-title":"Machine Translation System Based on Self-Attention Model[J]","volume":"2019","author":"Yan","journal-title":"Computer and Modernization"},{"key":"10.3233\/JIFS-212273_ref30","first-page":"1","article-title":"Trends of CPU, GPU and FPGA for high-performance computing[C]\/\/","author":"Vestias","year":"2014","journal-title":"2014 24th International Conference on Field Programmable Logic and Applications (FPL). IEEE"},{"key":"10.3233\/JIFS-212273_ref31","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1109\/FPT.2016.7929192","article-title":"Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC[C]\/\/","author":"Nurvitadhi","year":"2016","journal-title":"2016 International Conference on Field-Programmable Technology (FPT). IEEE"},{"key":"10.3233\/JIFS-212273_ref32","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1109\/ISVLSI.2010.84","article-title":"Blas comparison on fpga, cpu and gpu[C]\/\/","author":"Kestur","year":"2010","journal-title":"2010 IEEE computer society annual symposium on VLSI. IEEE"},{"key":"10.3233\/JIFS-212273_ref33","first-page":"1","article-title":"Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC[C]\/\/","author":"Nurvitadhi","year":"2016","journal-title":"2016 26th International Conference on Field Programmable Logic and Applications (FPL). IEEE"},{"key":"10.3233\/JIFS-212273_ref34","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1007\/978-3-319-56258-2_22","article-title":"Optimizing CNN-based object detection algorithms on embedded FPGA platforms[C]\/\/","author":"Zhao","year":"2017","journal-title":"International Symposium on Applied Reconfigurable Computing. Springer, Cham"},{"key":"10.3233\/JIFS-212273_ref35","first-page":"5325","article-title":"A convolutional neural network cascade for face detection[C]\/\/","author":"Li","year":"2015","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"10.3233\/JIFS-212273_ref36","first-page":"1","article-title":"Real-time road segmentation using lidar data processing on an fpga[C]\/\/","author":"Lyu","year":"2018","journal-title":"2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE"},{"key":"10.3233\/JIFS-212273_ref37","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1109\/SiPS.2016.48","article-title":"FPGA-based low-power speech recognition with recurrent neural networks[C]\/\/","author":"Lee","year":"2016","journal-title":"2016 IEEE International Workshop on Signal Processing Systems (SiPS). IEEE"},{"key":"10.3233\/JIFS-212273_ref38","first-page":"1","article-title":"FPGA implementation of spectral subtraction for in-car speech enhancement and recognition[C]\/\/","author":"Whittington","year":"2008","journal-title":"2008 2nd International Conference on Signal Processing and Communication Systems. IEEE"},{"issue":"3","key":"10.3233\/JIFS-212273_ref39","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1007\/s00034-011-9355-0","article-title":"FPGA-implementation of discrete wavelet transform with application to signal denoising[J]","volume":"31","author":"Bahoura","year":"2012","journal-title":"Circuits, Systems, and Signal Processing"},{"key":"10.3233\/JIFS-212273_ref40","doi-asserted-by":"crossref","first-page":"103464","DOI":"10.1016\/j.micpro.2020.103464","article-title":"English corpus translation system based on FPGA and machine learning[J]","author":"Shengxue","year":"2020","journal-title":"Microprocessors and Microsystems"},{"key":"10.3233\/JIFS-212273_ref41","first-page":"1","article-title":"Fpga-based acceleration of word2vec using opencl[C]\/\/","author":"Ono","year":"2019","journal-title":"2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE"},{"key":"10.3233\/JIFS-212273_ref42","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1145\/2847263.2847265","article-title":"Going deeper with embedded fpga platform for convolutional neural network[C]\/\/","author":"Qiu","year":"2016","journal-title":"Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays"},{"key":"10.3233\/JIFS-212273_ref43","unstructured":"Chang A.X.M. , Martini B. and Culurciello E. , Recurrent neural networks hardware implementation on FPGA[J], arXiv preprint arXiv:1511.05552, 2015."},{"key":"10.3233\/JIFS-212273_ref44","first-page":"13","article-title":"Implementation of a sigmoid activation function for neural network using FPGA[C]\/\/","author":"Jamel","year":"2012","journal-title":"13th Scientific Conference of Al-Ma\u2019moon University College"},{"key":"10.3233\/JIFS-212273_ref45","first-page":"1","article-title":"Adaptation of convolution and batch normalization layer for CNN implementation on FPGA[C]\/\/","author":"Sledevic","year":"2019","journal-title":"2019 Open Conference of Electrical, Electronic and Information Sciences (eStream). IEEE"},{"key":"10.3233\/JIFS-212273_ref46","doi-asserted-by":"crossref","first-page":"171608","DOI":"10.1109\/ACCESS.2020.3023946","article-title":"FPGAN: an FPGA accelerator for graph attention networks with software and hardware co-optimization[J]","volume":"8","author":"Yan","year":"2020","journal-title":"IEEE Access"},{"key":"10.3233\/JIFS-212273_ref47","doi-asserted-by":"crossref","unstructured":"Lu S. , Wang M. , Liang S. , et al., Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer[J], arXiv preprint arXiv:2009.08605, 2020.","DOI":"10.1109\/SOCC49529.2020.9524802"},{"key":"10.3233\/JIFS-212273_ref48","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1145\/3370748.3406567","article-title":"Ftrans: energy-efficient acceleration of transformers using fpga[C]\/\/","author":"Li","year":"2020","journal-title":"Proceedings of the ACM\/IEEE International Symposium on Low Power Electronics and Design"},{"key":"10.3233\/JIFS-212273_ref49","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/3174243.3174258","article-title":"A customizable matrix multiplication framework for the intel harpv2 xeon+ fpga platform: A deep learning case study[C]\/\/","author":"Moss","year":"2018","journal-title":"Proceedings of the 2018 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays"},{"key":"10.3233\/JIFS-212273_ref50","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/ReConFig.2009.30","article-title":"Matrix multiplication based on scalable macro-pipelined FPGA accelerator architecture[C]\/\/","author":"Jiang","year":"2009","journal-title":"2009 International Conference on Reconfigurable Computing and FPGAs. IEEE"},{"key":"10.3233\/JIFS-212273_ref51","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/HPCA47549.2020.00035","article-title":"A\u2227 3: Accelerating Attention Mechanisms in Neural Networks with Approximation[C]\/\/","author":"Ham","year":"2020","journal-title":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-212273","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:46:38Z","timestamp":1777455998000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-212273"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,10]]},"references-count":51,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/jifs-212273","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,10]]}}}