{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:09:05Z","timestamp":1760238545975,"version":"build-2065373602"},"reference-count":35,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,6,8]],"date-time":"2022-06-08T00:00:00Z","timestamp":1654646400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada (NSERC)","doi-asserted-by":"publisher","award":["06487-2017","NFRFE-2018-00484"],"award-info":[{"award-number":["06487-2017","NFRFE-2018-00484"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000155","name":"Government of Canada\u2019s New Frontiers in Research Fund (NFRF)","doi-asserted-by":"publisher","award":["06487-2017","NFRFE-2018-00484"],"award-info":[{"award-number":["06487-2017","NFRFE-2018-00484"]}],"id":[{"id":"10.13039\/501100000155","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Composing the representation of a sentence from the tokens that it comprises is difficult, because such a representation needs to account for how the words present relate to each other. The Transformer architecture does this by iteratively changing token representations with respect to one another. This has the drawback of requiring computation that grows quadratically with respect to the number of tokens. Furthermore, the scalar attention mechanism used by Transformers requires multiple sets of parameters to operate over different features. The present paper proposes a lighter algorithm for sentence representation with complexity linear in sequence length. This algorithm begins with a presumably erroneous value of a context vector and adjusts this value with respect to the tokens at hand. In order to achieve this, representations of words are built combining their symbolic embedding with a positional encoding into single vectors. The algorithm then iteratively weighs and aggregates these vectors using a second-order attention mechanism, which allows different feature pairs to interact with each other separately. Our models report strong results in several well-known text classification tasks.<\/jats:p>","DOI":"10.3390\/info13060290","type":"journal-article","created":{"date-parts":[[2022,6,10]],"date-time":"2022-06-10T02:25:33Z","timestamp":1654827933000},"page":"290","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Contextualizer: Connecting the Dots of Context with Second-Order Attention"],"prefix":"10.3390","volume":"13","author":[{"given":"Diego","family":"Maupom\u00e9","sequence":"first","affiliation":[{"name":"Department of Computer Science, Faculty of Sciences, Universit\u00e9 du Qu\u00e9bec \u00e0 Montr\u00e9al, Montreal, QC H3C 3P8, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8196-2153","authenticated-orcid":false,"given":"Marie-Jean","family":"Meurs","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Sciences, Universit\u00e9 du Qu\u00e9bec \u00e0 Montr\u00e9al, Montreal, QC H3C 3P8, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,8]]},"reference":[{"key":"ref_1","unstructured":"Manning, C., and Schutze, H. (1999). Foundations of Statistical Natural Language Processing, MIT Press."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.3389\/frobt.2019.00153","article-title":"Symbolic, distributed, and distributional representations for natural language processing in the era of deep learning: A survey","volume":"70","author":"Ferrone","year":"2020","journal-title":"Front. Robot. AI"},{"key":"ref_3","unstructured":"Socher, R., Manning, C.D., and Ng, A.Y. (2010, January 10). Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks. Proceedings of the NIPS\u20142010 Deep Learning and Unsupervised Feature Learning Workshop, Whistler, BC, Canada."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bowman, S.R., Potts, C., and Manning, C.D. (2015, January 31). Recursive Neural Networks Can Learn Logical Semantics. Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, Beijing, China.","DOI":"10.18653\/v1\/W15-4002"},{"key":"ref_5","first-page":"5998","article-title":"Attention is All You Need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Cheng, J., Dong, L., and Lapata, M. (2016). Long Short-Term Memory-Networks for Machine Reading. arXiv.","DOI":"10.18653\/v1\/D16-1053"},{"key":"ref_7","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Cer, D., Yang, Y., Kong, S.y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., and Tar, C. (November, January 31). Universal Sentence Encoder for English. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-2029"},{"key":"ref_9","unstructured":"Clark, K., Luong, M.T., Le, Q.V., and Manning, C.D. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1007\/s11023-020-09548-1","article-title":"GPT-3: Its nature, scope, limits, and consequences","volume":"30","author":"Floridi","year":"2020","journal-title":"Minds Mach."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S.R. (2019). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. arXiv.","DOI":"10.18653\/v1\/W18-5446"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Maupom\u00e9, D., Rancourt, F., Armstrong, M.D., and Meurs, M.J. (2021, January 25\u201328). Position Encoding Schemes for Linear Aggregation of Word Sequences. Proceedings of the Canadian Conference on Artificial Intelligence, Vancouver, BC, USA.","DOI":"10.21428\/594757db.37d7654d"},{"key":"ref_13","unstructured":"Rabanser, S., Shchur, O., and G\u00fcnnemann, S. (2017). Introduction to Tensor Decompositions and their applications in Machine Learning. arXiv."},{"key":"ref_14","unstructured":"Sutskever, I., Martens, J., and Hinton, G.E. (July, January 28). Generating Text with Recurrent Neural Networks. Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA."},{"key":"ref_15","unstructured":"Maupom\u00e9, D., and Meurs, M.J. (2020, January 11\u201316). Language Modeling with a General Second-Order RNN. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_16","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer Normalization. arXiv."},{"key":"ref_17","first-page":"4055","article-title":"Image Transformer","volume":"Volume 80","author":"Dy","year":"2018","journal-title":"Proceedings of the 35th International Conference on Machine Learning"},{"key":"ref_18","unstructured":"Beltagy, I., Peters, M.E., and Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv."},{"key":"ref_19","unstructured":"Chelba, C., Chen, M., Bapna, A., and Shazeer, N. (2020). Faster Transformer Decoding: N-gram Masked Self-Attention. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., and Salakhutdinov, R. (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv.","DOI":"10.18653\/v1\/P19-1285"},{"key":"ref_21","unstructured":"Kitaev, N., Kaiser, \u0141., and Levskaya, A. (2020). Reformer: The Efficient Transformer. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Roy, A., Saffar, M., Vaswani, A., and Grangier, D. (2020). Efficient Content-Based Sparse Attention with Routing Transformers. arXiv.","DOI":"10.1162\/tacl_a_00353"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Bai, J., Wang, Y., Chen, Y., Yang, Y., Bai, J., Yu, J., and Tong, Y. (2021). Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees. arXiv.","DOI":"10.18653\/v1\/2021.eacl-main.262"},{"key":"ref_24","first-page":"5156","article-title":"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention","volume":"Volume 119","author":"Singh","year":"2020","journal-title":"Proceedings of the 37th International Conference on Machine Learning"},{"key":"ref_25","unstructured":"Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., and Kaiser, L. (2021). Rethinking Attention with Performers. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Pang, B., and Lee, L. (2005, January 25\u201330). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, USA.","DOI":"10.3115\/1219840.1219855"},{"key":"ref_27","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Schuster, M., and Nakajima, K. (2012, January 25\u201330). Japanese and Korean voice search. Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan. ISSN 2379-190X.","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"ref_29","unstructured":"Merity, S., Xiong, C., Bradbury, J., and Socher, R. (2016). Pointer Sentinel Mixture Models. arXiv."},{"key":"ref_30","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Pang, B., and Lee, L. (2004, January 21\u201326). A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain.","DOI":"10.3115\/1218955.1218990"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hu, M., and Liu, B. (2004, January 22\u201325). Mining and Summarizing Customer Reviews. Proceedings of the KDD\u201904: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014073"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1007\/s10579-005-7880-9","article-title":"Annotating Expressions of Opinions and Emotions in Language","volume":"39","author":"Wiebe","year":"2005","journal-title":"Lang. Resour. Eval."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"20539517211047734","DOI":"10.1177\/20539517211047734","article-title":"The great Transformer: Examining the role of large language models in the political economy of AI","volume":"8","author":"Luitse","year":"2021","journal-title":"Big Data Soc."},{"key":"ref_35","unstructured":"Strubell, E., Ganesh, A., and McCallum, A. (August, January 28). Energy and Policy Considerations for Deep Learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/6\/290\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:25:53Z","timestamp":1760138753000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/6\/290"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,8]]},"references-count":35,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["info13060290"],"URL":"https:\/\/doi.org\/10.3390\/info13060290","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,6,8]]}}}