{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,30]],"date-time":"2025-12-30T08:54:29Z","timestamp":1767084869456,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,4,22]],"date-time":"2022-04-22T00:00:00Z","timestamp":1650585600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Explainability is one of the key factors in Natural Language Processing (NLP) specially for legal documents, medical diagnosis, and clinical text. Attention mechanism has been a popular choice for such explainability recently by estimating the relative importance of input units. Recent research has revealed, however, that such processes tend to misidentify irrelevant input units when explaining them. This is due to the fact that language representation layers are initialized by pre-trained word embedding that is not context-dependent. Such a lack of context-dependent knowledge in the initial layer makes it difficult for the model to concentrate on the important aspects of input. Usually, this does not impact the performance of the model, but the explainability differs from human understanding. Hence, in this paper, we propose an ensemble method to use logic-based information from the Tsetlin Machine to embed it into the initial representation layer in the neural network to enhance the model in terms of explainability. We obtain the global clause score for each word in the vocabulary and feed it into the neural network layer as context-dependent information. Our experiments show that the ensemble method enhances the explainability of the attention layer without sacrificing any performance of the model and even outperforming in some datasets.<\/jats:p>","DOI":"10.3390\/a15050143","type":"journal-article","created":{"date-parts":[[2022,4,23]],"date-time":"2022-04-23T08:14:06Z","timestamp":1650701646000},"page":"143","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Enhancing Attention\u2019s Explanation Using Interpretable Tsetlin Machine"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1485-0439","authenticated-orcid":false,"given":"Rohan","family":"Yadav","sequence":"first","affiliation":[{"name":"Centre for Artificial Intelligence Research, Department of Information and Communication, University of Agder, 4879 Grimstad, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9377-5654","authenticated-orcid":false,"given":"Drago\u015f","family":"Nicolae","sequence":"additional","affiliation":[{"name":"Research Institute for Artificial Intelligence \u201cMihai Dr\u0103g\u0103nescu\u201d, 050711 Bucharest, Romania"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Marshall, I.J., and Wallace, B.C. (2016, January 1\u20135). Rationale-Augmented Convolutional Neural Networks for Text Classification. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA.","DOI":"10.18653\/v1\/D16-1076"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"189","DOI":"10.18653\/v1\/P17-1018","article-title":"Gated Self-Matching Networks for Reading Comprehension and Question Answering","volume":"Volume 1","author":"Wang","year":"2017","journal-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Lakkaraju, H., Bach, S.H., and Leskovec, J. (2016, January 13\u201317). Interpretable Decision Sets: A Joint Framework for Description and Prediction. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201916), San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939874"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Mahoney, C.J., Zhang, J., Huber-Fliflet, N., Gronvall, P., and Zhao, H. (2019, January 9\u201312). A Framework for Explainable Text Classification in Legal Document Review. Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA.","DOI":"10.1109\/BigData47090.2019.9005659"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., and Guestrin, C. (2016, January 13\u201317). \u201cWhy Should I Trust You?\u201d: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2939672.2939778"},{"key":"ref_6","unstructured":"Sundararajan, M., Taly, A., and Yan, Q. (2019, January 6\u201311). Axiomatic Attribution for Deep Networks. Proceedings of the 34th International Conference on Machine Learning JMLR (ICML\u201917), Sydney, Australia."},{"key":"ref_7","unstructured":"Camburu, O.M., Rockt\u00e4schel, T., Lukasiewicz, T., and Blunsom, P. (2018, January 3\u20138). e-SNLI: Natural Language Inference with Natural Language Explanations. Proceedings of the NeurIPS, Montr\u00e9al, QC, Canada."},{"key":"ref_8","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Parikh, A., T\u00e4ckstr\u00f6m, O., Das, D., and Uszkoreit, J. (2016, January 1\u20135). A Decomposable Attention Model for Natural Language Inference. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA.","DOI":"10.18653\/v1\/D16-1244"},{"key":"ref_10","unstructured":"Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention is All you Need. arXiv."},{"key":"ref_11","first-page":"4171","article-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","volume":"Volume 1","author":"Devlin","year":"2019","journal-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"ref_12","unstructured":"Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv."},{"key":"ref_13","first-page":"3543","article-title":"Attention is not Explanation","volume":"Volume 1","author":"Jain","year":"2019","journal-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wiegreffe, S., and Pinter, Y. (2019, January 3\u20137). Attention is not not Explanation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1002"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/3236386.3241340","article-title":"The Mythos of Model Interpretability","volume":"16","author":"Lipton","year":"2018","journal-title":"Queue"},{"key":"ref_16","unstructured":"Granmo, O.C. (2018). The Tsetlin Machine\u2014A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic. arXiv."},{"key":"ref_17","unstructured":"Granmo, O.C., Glimsdal, S., Jiao, L., Goodwin, M., Omlin, C.W., and Berge, G.T. (2019). The Convolutional Tsetlin Machine. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yadav, R.K., Jiao, L., Granmo, O.C., and Goodwin, M. (2021, January 2\u20139). Human-Level Interpretable Learning for Aspect-Based Sentiment Analysis. Proceedings of the AAAI, Vancouver, BC, Canada.","DOI":"10.1609\/aaai.v35i16.17671"},{"key":"ref_19","unstructured":"Bhattarai, B., Granmo, O.C., and Jiao, L. (2021). Explainable Tsetlin Machine framework for fake news detection with credibility score assessment. arXiv."},{"key":"ref_20","unstructured":"Abeyrathna, K.D., Bhattarai, B., Goodwin, M., Gorji, S.R., Granmo, O.C., Jiao, L., Saha, R., and Yadav, R.K. Massively Parallel and Asynchronous Tsetlin Machine Architecture Supporting Almost Constant-Time Scaling. Proceedings of the ICML, PMLR, Online."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yadav, R.K., Jiao, L., Granmo, O.C., and Goodwin, M. (2021, January 4\u20136). Interpretability in Word Sense Disambiguation using Tsetlin Machine. Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART), Vienna, Austria.","DOI":"10.5220\/0010382104020409"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yadav, R.K., Jiao, L., Granmo, O.C., and Goodwin, M. (2021, January 11). Enhancing Interpretable Clauses Semantically using Pretrained Word Representation. Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.blackboxnlp-1.19"},{"key":"ref_23","unstructured":"Zaidan, O., Eisner, J., and Piatko, C. (2007, January 22\u201327). Using Annotator Rationales to Improve Machine Learning for Text Categorization. Proceedings of the Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, NY, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cho, K., van Merrienboer, B., G\u00fcl\u00e7ehre, \u00c7., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014, January 25\u201329). Learning Phrase Representations using RNN Encoder\u2013Decoder for Statistical Machine Translation. Proceedings of the EMNLP, Doha, Qatar.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014, January 25\u201329). Glove: Global Vectors for Word Representation. Proceedings of the EMNLP, Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_26","unstructured":"Doshi-Velez, F., and Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. arXiv."},{"key":"ref_27","unstructured":"Liu, H., Yin, Q., and Wang, W.Y. (August, January 28). Towards Explainable NLP: A Generative Explanation Framework for Text Classification. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2594473.2594475","article-title":"Comprehensible classification models: A position paper","volume":"15","author":"Freitas","year":"2014","journal-title":"SIGKDD Explor."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Wu, Y.N., and Zhu, S.C. (2018, January 18\u201323). Interpretable Convolutional Neural Networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00920"},{"key":"ref_30","unstructured":"Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., and Bengio, Y. (2015, January 6\u201311). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the ICML, Lille, France."},{"key":"ref_31","unstructured":"Bao, Y., Chang, S., Yu, M., and Barzilay, R. (November, January 31). Deriving Machine Attention from Human Rationales. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Mohankumar, A.K., Nema, P., Narasimhan, S., Khapra, M.M., Srinivasan, B.V., and Ravindran, B. (2020, January 5\u201310). Towards Transparent and Explainable Attention Models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, WA, USA.","DOI":"10.18653\/v1\/2020.acl-main.387"},{"key":"ref_33","unstructured":"McDonnell, T., Lease, M., Kutlu, M., and Elsayed, T. (November, January 30). Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments. Proceedings of the HCOMP, Austin, TX, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhang, X., Jiao, L., Granmo, O.C., and Goodwin, M. (2021). On the Convergence of Tsetlin Machines for the IDENTITY- and NOT Operators. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2021.3085591"},{"key":"ref_35","unstructured":"Sharma, J., Yadav, R., Granmo, O.C., and Jiao, L. (2021). Human Interpretable AI: Enhancing Tsetlin Machine Stochasticity with Drop Clause. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lei, J., Wheeldon, A., Shafik, R., Yakovlev, A., and Granmo, O.C. From Arithmetic to Logic Based AI: A Comparative Analysis of Neural Networks and Tsetlin Machine. Proceedings of the 27th IEEE International Conference on Electronics Circuits and Systems (ICECS2020), Online.","DOI":"10.1109\/ICECS49266.2020.9294877"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Lei, J., Rahman, T., Shafik, R., Wheeldon, A., Yakovlev, A., Granmo, O.C., Kawsar, F., and Mathur, A. (2021). Low-Power Audio Keyword Spotting Using Tsetlin Machines. J. Low Power Electron. Appl., 11.","DOI":"10.20944\/preprints202101.0621.v1"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Mikolov, T., Karafi, M., and Khudanpur, S. (2010, January 26\u201330). Recurrent neural network based language model. Proceedings of the Interspeech, Makuhari, Japan.","DOI":"10.21437\/Interspeech.2010-343"},{"key":"ref_39","unstructured":"Chung, J., G\u00fcl\u00e7ehre, \u00c7., Cho, K., and Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pang, B., and Lee, L. (2005, January 26\u201331). Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. Proceedings of the Association for Computational Linguistics, Ann Arbor, MI, USA.","DOI":"10.3115\/1219840.1219855"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Tang, J., Qu, M., and Mei, Q. (2015, January 10\u201313). PTE: Predictive Text Embedding through Large-Scale Heterogeneous Text Networks. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia.","DOI":"10.1145\/2783258.2783307"},{"key":"ref_42","unstructured":"(2022, March 01). Chollet, Fran\u00e7ois and Others: Keras. Available online: https:\/\/keras.io."},{"key":"ref_43","unstructured":"Kingma, D.P., and Ba, J. (2015). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_44","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Kim, Y. (2014, January 25\u201329). Convolutional Neural Networks for Sentence Classification. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1181"},{"key":"ref_46","unstructured":"Liu, P., Qiu, X., and Huang, X. (2016, January 9\u201316). Recurrent Neural Network for Text Classification with Multi-Task Learning. Proceedings of the IJCAI, Manhattan, CA, USA."},{"key":"ref_47","unstructured":"Le, Q., and Mikolov, T. (2014, January 21\u201326). Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on Machine Learning, Bejing, China. PMLR."},{"key":"ref_48","first-page":"427","article-title":"Bag of Tricks for Efficient Text Classification","volume":"Volume 2","author":"Joulin","year":"2017","journal-title":"Proceedings of the EACL"},{"key":"ref_49","first-page":"440","article-title":"Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms","volume":"Volume 1","author":"Shen","year":"2018","journal-title":"Proceedings of the ACL"},{"key":"ref_50","unstructured":"Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (2016, January 5\u201310). Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/5\/143\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:59:11Z","timestamp":1760137151000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/5\/143"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,22]]},"references-count":50,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["a15050143"],"URL":"https:\/\/doi.org\/10.3390\/a15050143","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,4,22]]}}}