{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T03:18:11Z","timestamp":1771125491292,"version":"3.50.1"},"reference-count":45,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["61901079"],"award-info":[{"award-number":["61901079"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["61403110308"],"award-info":[{"award-number":["61403110308"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"General Project Fund in the Field of Equipment Development Department","award":["61901079"],"award-info":[{"award-number":["61901079"]}]},{"name":"General Project Fund in the Field of Equipment Development Department","award":["61403110308"],"award-info":[{"award-number":["61403110308"]}]},{"name":"Dalian University","award":["61901079"],"award-info":[{"award-number":["61901079"]}]},{"name":"Dalian University","award":["61403110308"],"award-info":[{"award-number":["61403110308"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Since introducing the Transformer model, it has dramatically influenced various fields of machine learning. The field of time series prediction has also been significantly impacted, where Transformer family models have flourished, and many variants have been differentiated. These Transformer models mainly use attention mechanisms to implement feature extraction and multi-head attention mechanisms to enhance the strength of feature extraction. However, multi-head attention is essentially a simple superposition of the same attention, so they do not guarantee that the model can capture different features. Conversely, multi-head attention mechanisms may lead to much information redundancy and computational resource waste. In order to ensure that the Transformer can capture information from multiple perspectives and increase the diversity of its captured features, this paper proposes a hierarchical attention mechanism, for the first time, to improve the shortcomings of insufficient information diversity captured by the traditional multi-head attention mechanisms and the lack of information interaction among the heads. Additionally, global feature aggregation using graph networks is used to mitigate inductive bias. Finally, we conducted experiments on four benchmark datasets, and the experimental results show that the proposed model can outperform the baseline model in several metrics.<\/jats:p>","DOI":"10.3390\/s23115093","type":"journal-article","created":{"date-parts":[[2023,5,27]],"date-time":"2023-05-27T16:17:33Z","timestamp":1685204253000},"page":"5093","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1183-9165","authenticated-orcid":false,"given":"Bo","family":"Peng","sequence":"first","affiliation":[{"name":"Communication and Network Laboratory, Dalian University, Dalian 116622, China"},{"name":"School of Information Engineering, Dalian University, Dalian 116622, China"}]},{"given":"Yuanming","family":"Ding","sequence":"additional","affiliation":[{"name":"Communication and Network Laboratory, Dalian University, Dalian 116622, China"},{"name":"School of Information Engineering, Dalian University, Dalian 116622, China"}]},{"given":"Wei","family":"Kang","sequence":"additional","affiliation":[{"name":"Communication and Network Laboratory, Dalian University, Dalian 116622, China"},{"name":"School of Information Engineering, Dalian University, Dalian 116622, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,26]]},"reference":[{"key":"ref_1","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, Q., Li, B., Xiao, T., Zhu, J., Li, C., Wong, D.F., and Chao, L.S. (2019). Learning deep transformer models for machine translation. arXiv.","DOI":"10.18653\/v1\/P19-1176"},{"key":"ref_3","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2021, January 23). Attention Is All You Need. Available online: http:\/\/arxiv.org\/abs\/1706.03762."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lin, T., Wang, Y., Liu, X., and Qiu, X. (2022). A survey of transformers. AI Open.","DOI":"10.1016\/j.aiopen.2022.10.001"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_6","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_7","unstructured":"Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., and Tran, D. (2018, January 10\u201315). Image transformer. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_8","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Chen, X., Wu, Y., Wang, Z., Liu, S., and Li, J. (2021, January 6\u201311). Developing real-time streaming transformer transducer for speech recognition on large-scale dataset. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413535"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Dong, L., Xu, S., and Xu, B. (2018, January 15\u201320). Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462506"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Gulati, A., Qin, J., Chiu, C.C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., and Wu, Y. (2020). Conformer: Convolution-augmented transformer for speech recognition. arXiv.","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"ref_12","unstructured":"Huang, C.Z.A., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A.M., Hoffman, M.D., Dinculescu, M., and Eck, D. (2018). Music transformer. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Huang, Y.S., and Yang, Y.H. (2020, January 12\u201316). Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413671"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1021\/acscentsci.9b00576","article-title":"Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction","volume":"5","author":"Schwaller","year":"2019","journal-title":"ACS Cent. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_17","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"1","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_19","first-page":"68","article-title":"Stand-alone self-attention in vision models","volume":"32","author":"Ramachandran","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","unstructured":"Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., and Sutskever, I. (2020, January 13\u201318). Generative pretraining from pixels. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_21","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 20\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_23","unstructured":"Van Der Westhuizen, J., and Lasenby, J. (2018). The unreasonable effectiveness of the forget gate. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Graves, A., and Graves, A. (2012). Supervised Sequence Labelling with Recurrent Neural Networks, Springer.","DOI":"10.1007\/978-3-642-24797-2"},{"key":"ref_25","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5929","DOI":"10.1007\/s10462-020-09838-1","article-title":"A review on the long short-term memory model","volume":"53","author":"Mosquera","year":"2020","journal-title":"Artif. Intell. Rev."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"98053","DOI":"10.1109\/ACCESS.2019.2929692","article-title":"T-LSTM: A long short-term memory neural network enhanced by temporal information for traffic flow prediction","volume":"7","author":"Mou","year":"2019","journal-title":"IEEE Access"},{"key":"ref_29","unstructured":"Pascanu, R., Mikolov, T., and Bengio, Y. (2013, January 16\u201321). On the difficulty of training recurrent neural networks. Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_30","first-page":"3104","article-title":"Sequence to sequence learning with neural networks","volume":"27","author":"Sutskever","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Kitaev, N., Kaiser, \u0141., and Levskaya, A. (2020). Reformer: The efficient transformer. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W. (2021, January 2\u20139). Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.","DOI":"10.1609\/aaai.v35i12.17325"},{"key":"ref_33","first-page":"22419","article-title":"Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting","volume":"34","author":"Wu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_34","unstructured":"Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., and Jin, R. (2022). FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. arXiv."},{"key":"ref_35","unstructured":"Chollet, F. (2019). On the measure of intelligence. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1613\/jair.731","article-title":"A model of inductive bias learning","volume":"12","author":"Baxter","year":"2000","journal-title":"J. Artif. Intell. Res."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1145\/2347736.2347755","article-title":"A few useful things to know about machine learning","volume":"55","author":"Domingos","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_38","unstructured":"Kipf, T.N., and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv."},{"key":"ref_39","unstructured":"Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2015). Gated graph sequence neural networks. arXiv."},{"key":"ref_40","first-page":"1025","article-title":"Inductive representation learning on large graphs","volume":"30","author":"Hamilton","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","unstructured":"Veli\u010dkovi\u0107, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1007\/s12559-023-10110-1","article-title":"Dialogue relation extraction with document-level heterogeneous graph attention networks","volume":"15","author":"Chen","year":"2023","journal-title":"Cogn. Comput."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lai, G., Chang, W.C., Yang, Y., and Liu, H. (2018, January 8\u201312). Modeling long-and short-term temporal patterns with deep neural networks. Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA.","DOI":"10.1145\/3209978.3210006"},{"key":"ref_44","first-page":"5243","article-title":"Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting","volume":"32","author":"Li","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_45","unstructured":"Hao, H., Wang, Y., Xia, Y., Zhao, J., and Shen, F. (2020). Temporal convolutional attention-based network for sequence modeling. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/11\/5093\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:42:38Z","timestamp":1760125358000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/11\/5093"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":45,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["s23115093"],"URL":"https:\/\/doi.org\/10.3390\/s23115093","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}