{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:28:41Z","timestamp":1777854521205,"version":"3.51.4"},"reference-count":27,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2021,6,14]],"date-time":"2021-06-14T00:00:00Z","timestamp":1623628800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:p>Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.<\/jats:p>","DOI":"10.1177\/01655515211018401","type":"journal-article","created":{"date-parts":[[2021,6,14]],"date-time":"2021-06-14T23:48:28Z","timestamp":1623714508000},"page":"726-739","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":9,"title":["An approach for document retrieval using cluster-based inverted indexing"],"prefix":"10.1177","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8929-3927","authenticated-orcid":false,"given":"Gunjan","family":"Chandwani","sequence":"first","affiliation":[{"name":"Manav Rachna University, India; Dr. A.P.J. Abdul Kalam Technical University (AKTU), India"}]},{"given":"Anil","family":"Ahlawat","sequence":"additional","affiliation":[{"name":"Academics, KIET Group of Institutions, India"}]},{"given":"Gaurav","family":"Dubey","sequence":"additional","affiliation":[{"name":"ABES Engineering College, India"}]}],"member":"179","published-online":{"date-parts":[[2021,6,14]]},"reference":[{"key":"bibr1-01655515211018401","first-page":"379","volume-title":"Big data analytics and knowledge discovery","author":"Chevalier M","year":"2016"},{"key":"bibr2-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871454"},{"key":"bibr3-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49944-4_18"},{"key":"bibr4-01655515211018401","unstructured":"Aye KN, Thein NL. A comparison of big data analytics approaches based on Hadoop MapReduce. In: Proceedings of the 11th international conference on computer applications, Yangon, Myanmar, 2013, https:\/\/www.academia.edu\/3502325\/A_Comparison_of_Big_Data_Analytics_Approaches_Based_on_Hadoop_MapReduce"},{"key":"bibr5-01655515211018401","volume-title":"Data warehousing in the age of big data","author":"Krishnan K","year":"2013"},{"key":"bibr6-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1998.0692"},{"key":"bibr7-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2017.12.007"},{"key":"bibr8-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10257"},{"key":"bibr9-01655515211018401","first-page":"743","volume-title":"Proceedings of the international symposium on communications and information technologies","author":"Thammasut D"},{"key":"bibr10-01655515211018401","doi-asserted-by":"publisher","DOI":"10.3802\/jgo.2020.31.e14"},{"key":"bibr11-01655515211018401","first-page":"216","volume":"7","author":"Maron ME","year":"1960","journal-title":"ACM"},{"key":"bibr12-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1145\/125187.125189"},{"key":"bibr13-01655515211018401","volume-title":"Proceedings of the 7th international conference on signal image technology & internet-based systems","author":"Chahine CA"},{"key":"bibr14-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1145\/356924.356928"},{"key":"bibr15-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1145\/183422.183424"},{"key":"bibr16-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00008-X"},{"key":"bibr17-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.09.002"},{"key":"bibr18-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2018.11.010"},{"key":"bibr19-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2018.11.035"},{"key":"bibr20-01655515211018401","first-page":"26","volume":"14","author":"Gonz\u00e1lez SM","year":"2014","journal-title":"Rev Sist Inform"},{"issue":"2","key":"bibr21-01655515211018401","first-page":"59","volume":"18","author":"Rad HZ","year":"2018","journal-title":"J Lang Stud"},{"key":"bibr22-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-8055-5_55"},{"key":"bibr23-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2018.06.029"},{"key":"bibr24-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-8848-3_5"},{"key":"bibr25-01655515211018401","doi-asserted-by":"publisher","DOI":"10.1109\/TFUZZ.2017.2742463"},{"key":"bibr26-01655515211018401","unstructured":"Twenty Newsgroups Data Set, https:\/\/archive.ics.uci.edu\/ml\/datasets\/Twenty+Newsgroups (accessed May 2019)."},{"key":"bibr27-01655515211018401","unstructured":"The Reuters Dataset, https:\/\/martin-thoma.com\/nlp-reuters\/ (accessed May 2019)."}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211018401","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515211018401","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211018401","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:09:20Z","timestamp":1777504160000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515211018401"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,14]]},"references-count":27,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["10.1177\/01655515211018401"],"URL":"https:\/\/doi.org\/10.1177\/01655515211018401","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,14]]}}}