{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T15:10:06Z","timestamp":1751382606856,"version":"3.41.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>One of the principal objectives of Natural Language Processing (NLP) is to generate meaningful representations from text. Improving the informativeness of the representations has led to a tremendous rise in the dimensionality and the memory footprint. It leads to a cascading effect amplifying the complexity of the downstream model by increasing its parameters. The available techniques cannot be applied to cross-modal applications such as text-to-image. To ameliorate these issues, a novel Text-to-Image Fixed-dimensional encoding technique through a self-supervised Variational Auto-Encoder (VAE) for semantic evaluation applying transformers (TexIm FAST) has been proposed in this article. The pictorial representations allow oblivious inference while retaining the linguistic intricacies and are potent in cross-modal applications. TexIm FAST deals with variable-length sequences and generates uniform-dimensional images with over 75% reduced memory footprint. It enhances the efficiency of the models for downstream tasks by reducing its parameters. The efficacy of TexIm FAST has been extensively analyzed for the task of Semantic Textual Similarity (STS) on a benchmark dataset and two new datasets put forth containing disproportionate sequences. The results demonstrate its exceptional ability to compare disparate-length sequences such as a text with its summary with 3% improvement in accuracy compared to the SOTA despite having 68% less parameters.<\/jats:p>","DOI":"10.1145\/3735974","type":"journal-article","created":{"date-parts":[[2025,5,16]],"date-time":"2025-05-16T16:46:56Z","timestamp":1747414016000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["TexIm FAST: Text-to-Image Encoding for Semantic Similarity Evaluation of Disproportionate Sequences"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9191-1771","authenticated-orcid":false,"given":"Wazib","family":"Ansar","sequence":"first","affiliation":[{"name":"A. K. Choudhury School of IT, University of Calcutta, Kolkata, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3392-6470","authenticated-orcid":false,"given":"Saptarsi","family":"Goswami","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bangabasi Morning College, Kolkata, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4380-3172","authenticated-orcid":false,"given":"Amlan","family":"Chakrabarti","sequence":"additional","affiliation":[{"name":"A. K. Choudhury School of IT, University of Calcutta, Kolkata, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6211-9790","authenticated-orcid":false,"given":"Basabi","family":"Chakraborty","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Madanapalle Institute of Technology and Science, Madanapalle, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-022-03865-x"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/978-981-19-7867-8_11","volume-title":"Proceedings of Computer Vision and Machine Intelligence (CVMI \u201922)","author":"Ansar Wazib","year":"2023","unstructured":"Wazib Ansar, Saptarsi Goswami, Amlan Chakrabarti, and Basabi Chakraborty. 2023. TexIm: A novel text-to-image encoding technique using BERT. In Proceedings of Computer Vision and Machine Intelligence (CVMI \u201922). Springer, 123\u2013139."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3440755"},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the Workshop on Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.48"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3406095"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589002"},{"key":"e_1_3_2_9_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171\u20134186."},{"key":"e_1_3_2_10_2","volume-title":"Proceedings of the 3rd International Workshop on Paraphrasing (IWP \u201905)","author":"Dolan Bill","year":"2005","unstructured":"Bill Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the 3rd International Workshop on Paraphrasing (IWP \u201905)."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113679"},{"issue":"6","key":"e_1_3_2_12_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3345317","article-title":"Academic plagiarism detection: A systematic literature review","volume":"52","author":"Folt\u1ef3nek Tom\u00e1\u0161","year":"2019","unstructured":"Tom\u00e1\u0161 Folt\u1ef3nek, Norman Meuschke, and Bela Gipp. 2019. Academic plagiarism detection: A systematic literature review. ACM Computing Surveys 52, 6 (2019), 1\u201342.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_2_13_2","unstructured":"Ian Goodfellow Yoshua Bengio and Aaron Courville. 2016. Deep Learning. MIT Press."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1108"},{"key":"e_1_3_2_15_2","first-page":"95","volume-title":"Proceedings of the 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications\/13th IEEE International Conference on Big Data Science and Engineering (TrustCom\/BigDataSE). IEEE","author":"He Ke","year":"2019","unstructured":"Ke He and Dong-Seong Kim. 2019. Malware detection with malware images using deep learning techniques. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications\/13th IEEE International Conference on Big Data Science and Engineering (TrustCom\/BigDataSE). IEEE, 95\u2013102."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2023.3310002"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2015.01.001"},{"key":"e_1_3_2_19_2","first-page":"957","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","author":"Kusner Matt","year":"2015","unstructured":"Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. PMLR, 957\u2013966."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3099021"},{"key":"e_1_3_2_21_2","first-page":"1","volume-title":"Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN)","author":"Liu Danyang","year":"2019","unstructured":"Danyang Liu and Gongshen Liu. 2019. A transformer-based variational autoencoder for sentence generation. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1\u20137."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289183"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.04.054"},{"issue":"23","key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"e5627","DOI":"10.1002\/cpe.5627","article-title":"Cyberbullying detection in social media text based on character-level convolutional neural network with shortcuts","volume":"32","author":"Lu Nijia","year":"2020","unstructured":"Nijia Lu, Guohua Wu, Zhen Zhang, Yitao Zheng, Yizhi Ren, and Kim-Kwang Raymond Choo. 2020. Cyberbullying detection in social media text based on character-level convolutional neural network with shortcuts. Concurrency and Computation: Practice and Experience 32, 23 (2020), e5627.","journal-title":"Concurrency and Computation: Practice and Experience"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K16-1028"},{"key":"e_1_3_2_26_2","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Narayan Shashi","year":"2018","unstructured":"Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018. Don\u2019t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2016904.2016908"},{"issue":"3","key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"e12579","DOI":"10.1002\/eng2.12579","article-title":"Sentiment analysis on social media tweets using dimensionality reduction and natural language processing","volume":"5","author":"Omuya Erick Odhiambo","year":"2023","unstructured":"Erick Odhiambo Omuya, George Okeyo, and Michael Kimwele. 2023. Sentiment analysis on social media tweets using dimensionality reduction and natural language processing. Engineering Reports 5, 3 (2023), e12579.","journal-title":"Engineering Reports"},{"key":"e_1_3_2_29_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311\u2013318."},{"key":"e_1_3_2_30_2","unstructured":"Stephen M. Petrie and T\u2019Mir D. Julius. 2019. Representing text as abstract images enables image classifiers to also simultaneously classify text. arXiv:1908.07846. Retrieved from https:\/\/arxiv.org\/abs\/1908.07846"},{"issue":"140","key":"e_1_3_2_31_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_32_2","first-page":"8821","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning. PMLR, 8821\u20138831."},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC \u201918)","author":"Rodrigues Joao","year":"2018","unstructured":"Joao Rodrigues, Chakaveh Saedi, Ant\u00f3nio Branco, and Joao Silva. 2018. Semantic equivalence detection: Are interrogatives harder than declaratives? In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC \u201918)."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2012.08.049"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3650033"},{"issue":"6","key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1016\/j.jksuci.2018.08.005","article-title":"A literature review on question answering techniques, paradigms and systems","volume":"32","author":"Antonio Calijorne Soares Marco","year":"2020","unstructured":"Marco Antonio Calijorne Soares and Fernando Silva Parreiras. 2020. A literature review on question answering techniques, paradigms and systems. Journal of King Saud University-Computer and Information Sciences 32, 6 (2020), 635\u2013646.","journal-title":"Journal of King Saud University-Computer and Information Sciences"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D13-1170"},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"31","author":"Subramanian Sandeep","year":"2018","unstructured":"Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, and Chris Pal. 2018. Towards text generation with adversarially learned neural outlines. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.114101"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3165573"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2019.102090"},{"key":"e_1_3_2_42_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2023.103576"},{"key":"e_1_3_2_44_2","first-page":"1340","volume-title":"Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING \u201916)","author":"Wang Zhiguo","year":"2016","unstructured":"Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING \u201916), 1340\u20131349."},{"key":"e_1_3_2_45_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https:\/\/arxiv.org\/abs\/1609.08144"},{"key":"e_1_3_2_46_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"32","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-019-7541-4"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3607192"},{"key":"e_1_3_2_49_2","doi-asserted-by":"crossref","first-page":"15590","DOI":"10.18653\/v1\/2023.acl-long.869","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zhao Xuandong","year":"2023","unstructured":"Xuandong Zhao, Siqi Ouyang, Zhiguo Yu, Ming Wu, and Lei Li. 2023. Pre-trained language models can be fully zero-shot learners. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 15590\u201315606."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3735974","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T14:47:19Z","timestamp":1751381239000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3735974"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":48,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3735974"],"URL":"https:\/\/doi.org\/10.1145\/3735974","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}