{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T22:41:29Z","timestamp":1773873689800,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>The World Wide Web has revolutionized the way we live, causing the number of web pages to increase exponentially. The web provides access to a tremendous amount of information, so it is difficult for internet users to locate accurate and useful information on the web. In order to categorize pages accurately based on the queries of users, methods of categorizing web pages need to be developed. The text content of web pages plays a significant role in the categorization of web pages. If a word\u2019s position is altered within a sentence, causing a change in the interpretation of that sentence, this phenomenon is called polysemy. In web page categorization, the polysemy property causes ambiguity and is referred to as the polysemy problem. This paper proposes a fine-tuned model to solve the polysemy problem, using contextual embeddings created by the symmetry multi-head encoder layer of the Bidirectional Encoder Representations from Transformers (BERT). The effectiveness of the proposed model was evaluated by using the benchmark datasets for web page categorization, i.e., WebKB and DMOZ. Furthermore, the experiment series also fine-tuned the proposed model\u2019s hyperparameters to achieve 96.00% and 84.00% F1-Scores, respectively, demonstrating the proposed model\u2019s importance compared to baseline approaches based on machine learning and deep learning.<\/jats:p>","DOI":"10.3390\/sym15020395","type":"journal-article","created":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T04:55:12Z","timestamp":1675313712000},"page":"395","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Contextual Embeddings-Based Web Page Categorization Using the Fine-Tune BERT Model"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1168-0032","authenticated-orcid":false,"given":"Amit Kumar","family":"Nandanwar","sequence":"first","affiliation":[{"name":"Computer Science & Engineering Department, Maulana Azad National Institute of Technology, Bhopal 462003, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jaytrilok","family":"Choudhary","sequence":"additional","affiliation":[{"name":"Computer Science & Engineering Department, Maulana Azad National Institute of Technology, Bhopal 462003, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"11921","DOI":"10.1007\/s11042-019-08373-8","article-title":"Web Page Classification: A Survey of Perspectives, Gaps, and Future Directions","volume":"79","author":"Hashemi","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1459352.1459357","article-title":"Web Page Classification","volume":"41","author":"Qi","year":"2009","journal-title":"ACM Comput. Surv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"176600","DOI":"10.1109\/ACCESS.2019.2953990","article-title":"Improving BERT-Based Text Classification with Auxiliary Sentence and Domain Knowledge","volume":"7","author":"Yu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"672","DOI":"10.3390\/make3030034","article-title":"A Survey of Machine Learning-Based Solutions for Phishing Website Detection","volume":"3","author":"Tang","year":"2021","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1007\/s11263-013-0636-x","article-title":"Image Classification with the Fisher Vector: Theory and Practice","volume":"105","author":"Perronnin","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.neucom.2019.01.078","article-title":"Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification","volume":"337","author":"Liu","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1016\/j.future.2017.03.003","article-title":"An Optimized Approach for Massive Web Page Classification Using Entity Similarity Based on Semantic Network","volume":"76","author":"Li","year":"2017","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/978-3-319-12979-2_6","article-title":"News Articles Classification Using Random Forests and Weighted Multimodal Features","volume":"Volume 8849","author":"Liparas","year":"2014","journal-title":"Multidisciplinary Information Retrieval"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nandanwar, A.K., and Choudhary, J. (2021). Semantic Features with Contextual Knowledge-Based Web Page Categorization Using the GloVe Model and Stacked BiLSTM. Symmetry, 13.","DOI":"10.3390\/sym13101772"},{"key":"ref_10","first-page":"619","article-title":"Web Page Categorization Based on Images as Multimedia Visual Feature Using Deep Convolution Neural Network","volume":"11","author":"Nandanwar","year":"2020","journal-title":"Int. J. Emerg. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"25219","DOI":"10.1007\/s11042-021-10891-3","article-title":"Ensemble Approach for Web Page Classification","volume":"80","author":"Gupta","year":"2021","journal-title":"Multimed. Tools Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1049\/trit.2018.1015","article-title":"CNN-RNN Based Method for License Plate Recognition","volume":"3","author":"Shivakumara","year":"2018","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3774607","DOI":"10.1155\/2021\/3774607","article-title":"Automated Amharic News Categorization Using Deep Learning Models","volume":"2021","author":"Endalie","year":"2021","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.cogsys.2019.12.005","article-title":"FNDNet\u2013A Deep Convolutional Neural Network for Fake News Detection","volume":"61","author":"Kaliyar","year":"2020","journal-title":"Cogn. Syst. Res."},{"key":"ref_15","first-page":"64","article-title":"Improving the Performance of Aspect Based Sentiment Analysis Using Fine-Tuned Bert Base Uncased Model","volume":"2","author":"Geetha","year":"2021","journal-title":"Int. J. Intell. Netw."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"73992","DOI":"10.1109\/ACCESS.2020.2988550","article-title":"Sentiment Classification Using a Single-Layered BiLSTM Model","volume":"8","author":"Hameed","year":"2020","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 26\u201328). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2291","DOI":"10.1080\/09540091.2022.2117274","article-title":"WTL-CNN: A News Text Classification Method of Convolutional Neural Network Based on Weighted Word Embedding","volume":"34","author":"Zhao","year":"2022","journal-title":"Connect. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1016\/j.procs.2022.09.132","article-title":"Combining FastText and Glove Word Embedding for Offensive and Hate Speech Text Detection","volume":"207","author":"Badri","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Artene, C.G., Tibeica, M.N., and Leon, F. (2021, January 28\u201330). Using BERT for Multi-Label Multi-Language Web Page Classification. Proceedings of the 2021 IEEE 17th International Conference on Intelligent Computer Communication and Processing ICCP 2021, Cluj-Napoca, Romania.","DOI":"10.1109\/ICCP53602.2021.9733492"},{"key":"ref_21","first-page":"98","article-title":"Fake News Classification Using Transformer Based Enhanced LSTM and BERT","volume":"3","author":"Rai","year":"2022","journal-title":"Int. J. Cogn. Comput. Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"4931","DOI":"10.1016\/j.matpr.2022.03.678","article-title":"Sentimental Analysis on User\u2019s Reviews Using BERT","volume":"62","author":"Selvakumar","year":"2022","journal-title":"Mater. Today Proc."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"102006","DOI":"10.1016\/j.cose.2020.102006","article-title":"Efficient Classification Model of Web News Documents Using Machine Learning Algorithms for Accurate Information","volume":"98","author":"Mulahuwaish","year":"2020","journal-title":"Comput. Secur."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1002\/int.21567","article-title":"Image Classification Based on the Combination of Text Features and Visual Features","volume":"28","author":"Tian","year":"2013","journal-title":"Int. J. Intell. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.ins.2003.03.003","article-title":"Web Page Feature Selection and Classification Using Neural Networks","volume":"158","author":"Selamat","year":"2004","journal-title":"Inf. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.amc.2015.07.120","article-title":"Web Page Classification Based on a Simplified Swarm Optimization","volume":"270","author":"Lee","year":"2015","journal-title":"Appl. Math. Comput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bacanin, N., Zivkovic, M., Stoean, C., Antonijevic, M., Janicijevic, S., Sarac, M., and Strumberger, I. (2022). Application of Natural Language Processing and Machine Learning Boosted with Swarm Intelligence for Spam Email Filtering. Mathematics, 10.","DOI":"10.3390\/math10224173"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3407","DOI":"10.1016\/j.eswa.2010.08.126","article-title":"A Web Page Classification System Based on a Genetic Algorithm Using Tagged-Terms as Features","volume":"38","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_29","first-page":"649260","article-title":"An Ant Colony Optimization Based Feature Selection for Web Page Classification","volume":"2014","year":"2014","journal-title":"Sci. World J."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Guo, Y., Mustafaoglu, Z., and Koundal, D. (2022). Spam Detection Using Bidirectional Transformers and Machine Learning Classifier Algorithms. J. Comput. Cogn. Eng.","DOI":"10.47852\/bonviewJCCE2202192"},{"key":"ref_31","first-page":"9534918","article-title":"Web Page Classification Algorithm Based on Deep Learning","volume":"2022","author":"Yu","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_32","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL HLT 2019-2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439726","article-title":"Deep Learning Based Text Classification: A Comprehensive Review","volume":"54","author":"Minaee","year":"2020","journal-title":"ACM Comput. Surv."},{"key":"ref_34","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_35","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv."},{"key":"ref_36","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv."},{"key":"ref_37","unstructured":"Li, C., and Liu, K. (2021). Smart Search Engine: A Design and Test of Intelligent Search of News with Classification. [Bachelor\u2019s Thesis, Dalarna University]."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1111\/coin.12478","article-title":"A Heterogeneous Stacking Ensemble Based Sentiment Analysis Framework Using Multiple Word Embeddings","volume":"38","author":"Subba","year":"2022","journal-title":"Comput. Intell."},{"key":"ref_39","unstructured":"McCallum (2021, July 12). The 4 Universities Data Set. Available online: http:\/\/www.cs.cmu.edu\/afs\/cs.cmu.edu\/project\/theo-20\/www\/data\/."},{"key":"ref_40","unstructured":"(2021, August 16). DMOZ-The Directory of the Web. Available online: https:\/\/www.dmoz-odp.org\/."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1007\/978-981-13-1595-4_45","article-title":"Hybrid System for MPAA Ratings of Movie Clips Using Support Vector Machine","volume":"Volume 817","author":"Vishwakarma","year":"2019","journal-title":"Advances in Intelligent Systems and Computing"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.artmed.2018.11.004","article-title":"Comparative Effectiveness of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) Architectures for Radiology Text Report Classification","volume":"97","author":"Banerjee","year":"2019","journal-title":"Artif. Intell. Med."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Solanki, S., Dehalwar, V., and Choudhary, J. (2021). Deep Learning for Spectrum Sensing in Cognitive Radio. Symmetry, 13.","DOI":"10.3390\/sym13010147"},{"key":"ref_44","first-page":"397","article-title":"Comparative Performance Analysis of Combined Svm-Pca for Content-Based Video Classification by Utilizing Inception V3","volume":"10","author":"Vishwakarma","year":"2019","journal-title":"Int. J. Emerg. Technol."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"106652","DOI":"10.1016\/j.asoc.2020.106652","article-title":"Lazy Fine-Tuning Algorithms for Na\u00efve Bayesian Text Classification","volume":"96","author":"Aljulaidan","year":"2020","journal-title":"Appl. Soft Comput. J."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Brahma, B., and Wadhvani, R. (2020). Solar Irradiance Forecasting Based on Deep Learning Methodologies and Multi-Site Data. Symmetry, 12.","DOI":"10.3390\/sym12111830"},{"key":"ref_47","first-page":"8422","article-title":"Contextual Semantic Embeddings Based on Fine-Tuned AraBERT Model for Arabic Text Multi-Class Categorization","volume":"34","year":"2022","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1080\/13614568.2016.1152316","article-title":"An Efficient Scheme for Automatic Web Pages Categorization Using the Support Vector Machine","volume":"22","author":"Bhalla","year":"2016","journal-title":"New Rev. Hypermedia Multimed."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/2\/395\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:22:10Z","timestamp":1760120530000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/2\/395"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,2]]},"references-count":48,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["sym15020395"],"URL":"https:\/\/doi.org\/10.3390\/sym15020395","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,2]]}}}