{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,21]],"date-time":"2025-04-21T18:05:21Z","timestamp":1745258721907,"version":"3.38.0"},"reference-count":29,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2014,4,11]],"date-time":"2014-04-11T00:00:00Z","timestamp":1397174400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p> Stop word identification is one of the most important tasks for many text processing applications such as information retrieval. Stop words occur too frequently in documents in a collection and do not contribute significantly to determining the context or information about the documents. These words are worthless as index terms and should be removed during indexing as well as before querying by an information retrieval system. In this paper, we propose an automatic aggregated methodology based on term frequency, normalized inverse document frequency and information model to extract the light stop words from Persian text. We define a \u2018light stop word\u2019 as a stop word that has few letters and is not a compound word. In the Persian language, a complete stop word list can be derived by combining the light stop words. The evaluation results, using a standard corpus, show a good percentage of coincidence between the Persian and English stop words and a significant improvement in the number of index terms. Specifically, the first 32 Persian light stop words have a great impact on the index size reduction and the set of stop words can reduce the number of index terms by about 27%. <\/jats:p>","DOI":"10.1177\/0165551514530655","type":"journal-article","created":{"date-parts":[[2014,4,12]],"date-time":"2014-04-12T02:26:51Z","timestamp":1397269611000},"page":"476-487","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["Automatic identification of light stop words for Persian information retrieval systems"],"prefix":"10.1177","volume":"40","author":[{"given":"Mohammad","family":"Sadeghi","sequence":"first","affiliation":[{"name":"Computer Science Department, University of Valladolid, Spain"}]},{"given":"Jes\u00fas","family":"Vegas","sequence":"additional","affiliation":[{"name":"Computer Science Department, University of Valladolid, Spain"}]}],"member":"179","published-online":{"date-parts":[[2014,4,11]]},"reference":[{"volume-title":"Information storage and retrieval","year":"1997","author":"Korfhage RR","key":"bibr1-0165551514530655"},{"key":"bibr2-0165551514530655","volume-title":"Managing gigabytes: Compressing and indexing documents and image","author":"Witten I","year":"1999","edition":"2"},{"volume-title":"Frequency analysis of English usage: Lexicon and grammar","year":"1982","author":"Francis W","key":"bibr3-0165551514530655"},{"key":"bibr4-0165551514530655","volume-title":"Information retrieval","author":"Van Rijsbergen CJ","year":"1979","edition":"2"},{"key":"bibr5-0165551514530655","unstructured":"English Stop Word List in WordNet, http:\/\/www.d.umn.edu\/~tpederse\/Group01\/WordNet\/words.txt (2013, accessed May 2013)."},{"key":"bibr6-0165551514530655","unstructured":"List of English stop words, http:\/\/norm.al\/2009\/04\/14\/list-of-english-stop-words\/ (2013, accessed May 2013)."},{"key":"bibr7-0165551514530655","unstructured":"Internet world stats, http:\/\/www.internetworldstats.com\/ (2012, December 2012)."},{"volume-title":"Proceedings of the fifth Dutch\u2013Belgian information retrieval workshop","year":"2005","author":"Lo RT-W","key":"bibr8-0165551514530655"},{"first-page":"1010","volume-title":"Proceedings of the 5th WSEAS international conference on applied computer science","author":"Zou F","key":"bibr9-0165551514530655"},{"issue":"8","key":"bibr10-0165551514530655","volume":"46","author":"Alajmi A","year":"2012","journal-title":"International Journal of Computer Applications"},{"issue":"3","key":"bibr11-0165551514530655","volume":"4","author":"El-Khair IA","year":"2006","journal-title":"International Journal of Computing and Information Sciences"},{"journal-title":"Information Science Research Institute, University of Nevada, Las Vegas, NV","year":"2003","author":"Taghva K","key":"bibr12-0165551514530655"},{"volume-title":"Proceedings of SDIUT-03: the symposium on document image understanding technology","author":"Taghva K","key":"bibr13-0165551514530655"},{"key":"bibr14-0165551514530655","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2009.05.002"},{"volume-title":"Proceedings of 2007 IEEE\/ACS international conference on computer systems and applications (AICCSA 2007)","author":"Esmaili KS","key":"bibr15-0165551514530655"},{"key":"bibr16-0165551514530655","doi-asserted-by":"publisher","DOI":"10.1108\/07378830910988559"},{"key":"bibr17-0165551514530655","first-page":"61","volume":"7","author":"Kalbasi I","year":"1990","journal-title":"Journal of Linguistics"},{"volume-title":"The derivational structure of word in modern Farsi","year":"2001","author":"Kalbasi I","key":"bibr18-0165551514530655"},{"journal-title":"Memoranda in Computer and Cognitive Science","year":"2000","author":"Megerdoomian K","key":"bibr19-0165551514530655"},{"volume-title":"11th Computer Society of Iran, computer conference","year":"2006","author":"Qasemizadeh B","key":"bibr20-0165551514530655"},{"journal-title":"Arabic language processing: Status and prospects, ACL\/EACL","year":"2001","author":"Rezaie S.","key":"bibr21-0165551514530655"},{"key":"bibr22-0165551514530655","unstructured":"Safavi K. \u062f\u0631\u0622\u0645\u062f\u06cc \u0628\u0631 \u0632\u0628\u0627\u0646 \u0634\u0646\u0627\u0633\u06cc . Tehran: BongahTarjomevaNashr, 1981."},{"key":"bibr23-0165551514530655","unstructured":"Bateni MR. \u062a\u0648\u1589\u06cc\u0641 \u0633\u0627\u062e\u062a\u0627\u0631\u06cc \u062f\u0633\u062a\u0648\u0631\u06cc \u0632\u0628\u0627\u0646 \u0641\u0627\u0631\u0633\u06cc \u0628\u0631 \u0628\u0646\u06cc\u0627\u062f \u06cc\u06a9 \u0646\u1592\u0631\u06cc\u0647 \u1593\u0645\u0648\u0645\u06cc \u0632\u0628\u0627\u0646, Tehran: Amir Kabir, 2003."},{"volume-title":"Human behaviours and the principle of least effort","year":"1949","author":"Zipf H","key":"bibr24-0165551514530655"},{"key":"bibr25-0165551514530655","doi-asserted-by":"publisher","DOI":"10.1147\/rd.14.0309"},{"key":"bibr26-0165551514530655","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630270302"},{"key":"bibr27-0165551514530655","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"bibr28-0165551514530655","unstructured":"Myerson RB. Fundamentals of social choice theory. Discussion Paper no. 1162, 1996."},{"key":"bibr29-0165551514530655","unstructured":"Wikipedia, http:\/\/en.wikipedia.org\/wiki\/Most_common_words_in_English (2013, accessed February 2013)."}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551514530655","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551514530655","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551514530655","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T11:17:42Z","timestamp":1741000662000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551514530655"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,4,11]]},"references-count":29,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.1177\/0165551514530655"],"URL":"https:\/\/doi.org\/10.1177\/0165551514530655","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"type":"print","value":"0165-5515"},{"type":"electronic","value":"1741-6485"}],"subject":[],"published":{"date-parts":[[2014,4,11]]}}}