{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T14:29:36Z","timestamp":1761920976378,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,1,23]],"date-time":"2022-01-23T00:00:00Z","timestamp":1642896000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deputyship for Research  and  Innovation,  Ministry  of  Education  in  Saudi Arabia","award":["20\/18"],"award-info":[{"award-number":["20\/18"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the hadith narrators is a big factor that hadith scholars consider. However, many narrators share similar names, and the narrators\u2019 full names are not usually included in the narration chains of hadiths. Thus, first, ambiguous narrators need to be identified. Then, their reliability level can be determined. There are no available datasets that could help address this problem of identifying narrators. Here, we present a new dataset that contains narration chains (sanads) with identified narrators. The AR-Sanad 280K dataset has around 280K artificial sanads and could be used to identify 18,298 narrators. After creating the AR-Sanad 280K dataset, we address the narrator disambiguation in several experimental setups. The hadith narrator disambiguation is modeled as a multiclass classification problem with 18,298 class labels. We test different representations and models in our experiments. The best results were achieved by finetuning BERT-Based deep learning model (AraBERT). We obtained a 92.9 Micro F1 score and 30.2 sanad error rate (SER) on the validation set of our artificial sanads AR-Sanad 280K dataset. Furthermore, we extracted a real test set from the sanads of the famous six books in Islamic hadith. We evaluated the best model on the real test data, and we achieved 83.5 Micro F1 score and 60.6 sanad error rate.<\/jats:p>","DOI":"10.3390\/info13020055","type":"journal-article","created":{"date-parts":[[2022,1,23]],"date-time":"2022-01-23T20:32:52Z","timestamp":1642969972000},"page":"55","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["AR-Sanad 280K: A Novel 280K Artificial Sanads Dataset for Hadith Narrator Disambiguation"],"prefix":"10.3390","volume":"13","author":[{"given":"Somaia","family":"Mahmoud","sequence":"first","affiliation":[{"name":"Department of Computer and Systems Engineering, Alexandria University, Alexandria 21526, Egypt"}]},{"given":"Omar","family":"Saif","sequence":"additional","affiliation":[{"name":"Faculty of Hadith and Islamic Studies, Islamic University of Madinah, Madinah 42351, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3729-5079","authenticated-orcid":false,"given":"Emad","family":"Nabil","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia"},{"name":"Faculty of Computers and Artificial Intelligence, Cairo University, Giza 12613, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9326-8409","authenticated-orcid":false,"given":"Mohammad","family":"Abdeen","sequence":"additional","affiliation":[{"name":"Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2215-8736","authenticated-orcid":false,"given":"Mustafa","family":"ElNainay","sequence":"additional","affiliation":[{"name":"Department of Computer and Systems Engineering, Alexandria University, Alexandria 21526, Egypt"},{"name":"Faculty of Computer Science and Engineering, AlAlamein International University, Matrouh 51718, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6149-1718","authenticated-orcid":false,"given":"Marwan","family":"Torki","sequence":"additional","affiliation":[{"name":"Department of Computer and Systems Engineering, Alexandria University, Alexandria 21526, Egypt"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,23]]},"reference":[{"key":"ref_1","unstructured":"Esposito, J.L. (2010). The Future of Islam, Oxford University Press."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Khan, I.A. (2010). Authentication of Hadith: Redefining the Criteria, Iiit.","DOI":"10.2307\/j.ctvkc67mk"},{"key":"ref_3","unstructured":"(1996). \u0645\u0642\u062f\u0645\u0629 \u0627\u0644\u0646\u0648\u0648\u064a \u0641\u064a \u0639\u0644\u0648\u0645 \u0627\u0644\u062d\u062f\u064a\u062b: \u0648\u0647\u064a \u0645\u0642\u062f\u0645\u0629\u0639\u0644\u0649 \u0635\u062d\u064a\u062d \u0645\u0633\u0644\u0645."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1369","DOI":"10.1007\/s10462-019-09692-w","article-title":"Computational and natural language processing based studies of hadith literature: A survey","volume":"52","author":"Azmi","year":"2019","journal-title":"Artif. Intell. Rev."},{"key":"ref_5","first-page":"101","article-title":"Analysis Name Entity Disambiguation Using Mining Evidence Method","volume":"22","author":"Astari","year":"2020","journal-title":"Paradig. J. Inform. Komput."},{"key":"ref_6","unstructured":"Azmi, A.M., and AlOfaidly, A.M. (2014, January 26\u201327). A novel method to automatically pass hukm on hadith. Proceedings of the 5th International Conference on Arabic Language Processing (CITALA\u201914), Oujda, Morocco."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1093\/jis\/2.1.86","article-title":"A note on work in progress on computerization of hadith","volume":"2","year":"1991","journal-title":"J. Islam. Stud."},{"key":"ref_8","unstructured":"Alias, N., Abd Rahman, N., Nor, Z., and Alias, M. (2016, January 30\u201331). Searching algorithm of authentic chain of narrators\u2019 in Shahih Bukhari book. Proceedings of the International Conference on Applied Computing, Mathematical Sciences and Engineering (ACME 2016), Johor Bahru, Malaysia."},{"key":"ref_9","first-page":"5054","article-title":"Digital hadith authentication: A literature review and analysis","volume":"96","author":"Luthfi","year":"2018","journal-title":"J. Theor. Appl. Inf. Technol."},{"key":"ref_10","first-page":"165","article-title":"A multilingual datasets repository of the hadith content","volume":"9","author":"Mahmood","year":"2018","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_11","unstructured":"Altammami, S., Atwell, E., and Alsalka, A. (2022, January 20\u201325). Constructing a Bilingual Hadith Corpus Using a Segmentation Tool. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hadiwinoto, C., Ng, H.T., and Gan, W.C. (2019, January 3\u20137). Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1533"},{"key":"ref_13","unstructured":"Loureiro, D., and Jorge, A. (August, January 28). Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Levine, Y., Lenz, B., Dagan, O., Ram, O., Padnos, D., Sharir, O., Shalev-Shwartz, S., Shashua, A., and Shoham, Y. (2020, January 5\u201310). SenseBERT: Driving Some Sense into BERT. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Online.","DOI":"10.18653\/v1\/2020.acl-main.423"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bevilacqua, M., and Navigli, R. (2020, January 5\u201310). Breaking through the 80% glass ceiling: Raising the state of the art in word sense disambiguation by incorporating knowledge graph information. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Online.","DOI":"10.18653\/v1\/2020.acl-main.255"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Huang, L., Sun, C., Qiu, X., and Huang, X.J. (2019, January 3\u20137). GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1355"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Blevins, T., and Zettlemoyer, L. (2020, January 5\u201310). Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Online.","DOI":"10.18653\/v1\/2020.acl-main.95"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yosef, M.A., Spaniol, M., and Weikum, G. (2014, January 25). AIDArabic A Named-Entity Disambiguation Framework for Arabic Text. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), Doha, Qatar.","DOI":"10.3115\/v1\/W14-3626"},{"key":"ref_19","unstructured":"Hoffart, J., Yosef, M.A., Bordino, I., F\u00fcrstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., and Weikum, G. (2011, January 27\u201331). Robust disambiguation of named entities in text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Al-Smadi, M., Talafha, B., Qawasmeh, O., Alandoli, M.N., Hussien, W.A., and Guetl, C. (2015, January 21\u201322). A hybrid approach for Arabic named entity disambiguation. Proceedings of the 15th International Conference on Knowledge Technologies and Data-Driven Business, Graz, Austria.","DOI":"10.1145\/2809563.2809589"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Gad-Elrab, M.H., Yosef, M.A., and Weikum, G. (2015, January 23). Named entity disambiguation for resource-poor languages. Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval, Melbourne, Australia.","DOI":"10.1145\/2810133.2810138"},{"key":"ref_22","unstructured":"Mahdisoltani, F., Biega, J., and Suchanek, F.M. (2021, November 15). A Knowledge Base from Multilingual Wikipedias\u2013Yago3. Technical Report, Technical Report, Telecom ParisTech. Available online: https:\/\/suchanek.name\/work\/publications\/cidr2015.pdf."},{"key":"ref_23","unstructured":"Steinberger, R., Pouliquen, B., Kabadjov, M., Belyaeva, J., and van der Goot, E. (2011, January 12\u201314). JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. Proceedings of the International Conference Recent Advances in Natural Language Processing, Hissar, Bulgaria."},{"key":"ref_24","unstructured":"Spitkovsky, V.I., and Chang, A.X. (2012, January 21\u201327). A cross-lingual dictionary for english wikipedia concepts. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC\u201912), Istanbul, Turkey."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"68","DOI":"10.29408\/edumatic.v4i2.2551","article-title":"Name Disambiguation Analysis Using the Word Sense Disambiguation Method in Hadith","volume":"4","author":"Prasetio","year":"2020","journal-title":"Edumatic J. Pendidik. Inform."},{"key":"ref_26","first-page":"379","article-title":"The Attention Given to Al-Muhmaluun (the Unspecified) Narrators in the Program of the Custodian of the Two Holy Mosques for the Prophetic Sunnah","volume":"1","year":"2020","journal-title":"Islam. Univ. J."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Shukur, Z., Fabil, N., Salim, J., and Noah, S.A. (2011). Visualization of the hadith chain of narrators. Proceedings of the International Visual Informatics Conference, Springer.","DOI":"10.1007\/978-3-642-25200-6_32"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Boella, M., Romani, F.R., Al-Raies, A., Solimando, C., and Lancioni, G. (2011). The SALAH Project: Segmentation and linguistic analysis of Hadith Arabic texts. Proceedings of the Asia Information Retrieval Symposium, Springer.","DOI":"10.1007\/978-3-642-25631-8_49"},{"key":"ref_29","first-page":"14","article-title":"Extraction and visualization of the chain of narrators from hadiths using named entity recognition and classification","volume":"5","author":"Siddiqui","year":"2014","journal-title":"Int. J. Comput. Linguist. Res"},{"key":"ref_30","first-page":"287","article-title":"A domain-based approach to extract Arabic person names using n-grams and simple rules","volume":"14","author":"Alhawarat","year":"2015","journal-title":"Asian J. Inf. Technol."},{"key":"ref_31","first-page":"9","article-title":"Data mining in Sciences of the prophet\u2019s tradition in general and in impeachment and amendment in particular","volume":"3","author":"Hamam","year":"2015","journal-title":"Int. J. Islam. Appl. Comput. Sci. Technol."},{"key":"ref_32","first-page":"153","article-title":"Multi-agent system for hadith processing","volume":"9","author":"Najeeb","year":"2015","journal-title":"Int. J. Softw. Eng. Appl."},{"key":"ref_33","unstructured":"Kenton, J.D.M.W.C., and Toutanova, L.K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL-HLT, Minnesota, MN, USA."},{"key":"ref_34","unstructured":"Antoun, W., Baly, F., and Hajj, H. (2020, January 11\u201316). AraBERT: Transformer-based Model for Arabic Language Understanding. Proceedings of the LREC 2020 Workshop Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Abdelali, A., Darwish, K., Durrani, N., and Mubarak, H. (2016, January 12\u201316). Farasa: A fast and furious segmenter for arabic. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego, CA, USA.","DOI":"10.18653\/v1\/N16-3003"},{"key":"ref_36","unstructured":"Antoun, W., Baly, F., and Hajj, H. (2021, January 19). AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding. Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine."},{"key":"ref_37","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (1909). Albert: A lite bert for self-supervised learning of language representations. arXiv."},{"key":"ref_38","unstructured":"(2014). \u0645\u062c\u0645\u0648\u0639\u0627\u062a \u0627\u0644\u0639\u0645\u0644: \u0627\u0644\u0645\u0647\u0645\u0627\u062a \u0648\u0627\u0644\u0645\u0646\u0627\u0647\u062c \u0648\u0627\u0644\u0636\u0648\u0627\u0628\u0637 \u0627\u0644\u0639\u0645\u0644\u064a\u0629."},{"key":"ref_39","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_40","unstructured":"You, K., Long, M., Wang, J., and Jordan, M.I. (2019). How does learning rate decay help modern neural networks?. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/2\/55\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:06:08Z","timestamp":1760133968000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/2\/55"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,23]]},"references-count":40,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["info13020055"],"URL":"https:\/\/doi.org\/10.3390\/info13020055","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,1,23]]}}}