{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:53:36Z","timestamp":1754157216304,"version":"3.41.2"},"reference-count":34,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2010,7,27]],"date-time":"2010-07-27T00:00:00Z","timestamp":1280188800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,7,27]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>The term selection problem for selecting query terms in information filtering and routing has been investigated using hill\u2010climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches, which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill\u2010climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, this paper aims to apply a genetic algorithm (GA) to the same problem.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>A standard TREC test collection is used from the TREC\u20108 filtering track, recording mean average precision and recall measures to allow comparison between the hill\u2010climber and GAs. It also varies elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as elitist and non\u2010elitist methods are used, as well as roulette wheel and rank selection GAs.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>The results of tests suggest that both techniques are, on average, better than the baseline, but, the implemented GA does not match the overall performance of a hill\u2010climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Research limitations\/implications<\/jats:title><jats:p>The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill\u2010climber. The evidence from term rank order would appear to provide extra useful evidence, which hill climbers can use efficiently, and effectively, to narrow the search space.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>The paper represents the first attempt to compare hill\u2010climbers with GAs on a problem of this type.<\/jats:p><\/jats:sec>","DOI":"10.1108\/00220411011052939","type":"journal-article","created":{"date-parts":[[2010,7,17]],"date-time":"2010-07-17T07:04:27Z","timestamp":1279350267000},"page":"513-531","source":"Crossref","is-referenced-by-count":3,"title":["An experimental comparison of a genetic algorithm and a hill\u2010climber for term selection"],"prefix":"10.1108","volume":"66","author":[{"given":"A.","family":"MacFarlane","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"A.","family":"Secker","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P.","family":"May","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J.","family":"Timmis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","reference":[{"key":"key2022012920052257700_b1","unstructured":"Beaulieu, M., Gatford, M., Huang, X., Robertson, S., Walker, S. and Williams, P. (1997), \u201cOkapi at TREC\u20105\u201d, in Voorhees, E. and Harman, D. (Eds), Proceedings of the Fifth Text Retrieval Conference, Gaithersburg, November 1996, NIST SP 500\u2010238, pp. 143\u201066."},{"key":"key2022012920052257700_b2","doi-asserted-by":"crossref","unstructured":"Boughanem, M., Chrisment, C. and Tamine, L. (2002), \u201cOn using genetic algorithms for multimodel relevance optimisation in information retrieval\u201d, Journal of the American Society for Information Science and Technology, Vol. 53 No. 11, pp. 934\u201042.","DOI":"10.1002\/asi.10119"},{"key":"key2022012920052257700_b3","doi-asserted-by":"crossref","unstructured":"Chang, Y. and Chen, S. (2006), \u201cA new query reweighting method for document retrieval based on genetic algorithms\u201d, IEEE Transactions on Evolutionary Computation, Vol. 10 No. 5, pp. 617\u201022.","DOI":"10.1109\/TEVC.2005.863130"},{"key":"key2022012920052257700_b4","doi-asserted-by":"crossref","unstructured":"Chen, H. (1995), \u201cMachine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms\u201d, Journal of the American Society for Information Science and Technology, Vol. 46 No. 3, pp. 194\u2010216.","DOI":"10.1002\/(SICI)1097-4571(199504)46:3<194::AID-ASI4>3.0.CO;2-S"},{"key":"key2022012920052257700_b5","doi-asserted-by":"crossref","unstructured":"Chen, H., Shankaranarayanan, G. and She, L. (1998), \u201cA machine learning app. roach to inductive query by examples: an experiment using relevance feedback, ID3, genetic algorithms and simulated annealing\u201d, Journal of the American Society for Information Science and Technology, Vol. 49 No. 8, pp. 693\u2010705.","DOI":"10.1002\/(SICI)1097-4571(199806)49:8<693::AID-ASI4>3.0.CO;2-O"},{"key":"key2022012920052257700_b6","doi-asserted-by":"crossref","unstructured":"Fan, W., Gordon, M.D. and Pathak, P. (2004), \u201cA generic ranking function discovery framework by genetic programming for information retrieval\u201d, Information Processing and Management, Vol. 40 No. 4, pp. 587\u2010602.","DOI":"10.1016\/j.ipm.2003.08.001"},{"key":"key2022012920052257700_b7","unstructured":"Goldberg, D.E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning, Addison\u2010Wesley, Harlow."},{"key":"key2022012920052257700_b8","unstructured":"Harman, D. (1992), \u201cRelevance feedback and other query modification techniques\u201d, in Frakes, W. and Baeza\u2010Yates, R. (Eds), Information Retrieval: Data Structures and Algorithms, Prentice\u2010Hall, Englewood Cliffs, NJ, pp. 241\u201063."},{"key":"key2022012920052257700_b9","unstructured":"Harman, D., Fox, E., Baeza\u2010Yates, R. and Lee, W. (1992), \u201cInverted files\u201d, in Frakes, W. and Baeza\u2010Yates, R. (Eds), Information Retrieval: Data Structures and Algorithms, Prentice\u2010Hall, Englewood Cliffs, NJ, pp. 28\u201043."},{"key":"key2022012920052257700_b10","doi-asserted-by":"crossref","unstructured":"Horng, J. and Yeh, C. (2000), \u201cApplying genetic algorithms to query optimization in document retrieval\u201d, Information Processing and Management, Vol. 36 No. 5, pp. 737\u201059.","DOI":"10.1016\/S0306-4573(00)00008-X"},{"key":"key2022012920052257700_b11","unstructured":"Hull, D. and Robertson, S. (2000), \u201cThe TREC\u20108 filtering track final report\u201d, in Voorhees, E.M. and Harman, D. (Eds), Proceedings of the Eighth Text Retrieval Conference, Gaithersburg, November 1999, NIST SP 500\u2010246, pp. 35\u201055."},{"key":"key2022012920052257700_b12","doi-asserted-by":"crossref","unstructured":"Lopez\u2010Pujalte, C., Guerrero Bote, V. and de Moy Anegon, F. (2002), \u201cA test of genetic algorithms in relevance feedback\u201d, Information Processing and Management, Vol. 38 No. 6, pp. 793\u2010805.","DOI":"10.1016\/S0306-4573(01)00061-9"},{"key":"key2022012920052257700_b13","doi-asserted-by":"crossref","unstructured":"Lopez\u2010Pujalte, C., Guerrero Bote, V. and de Moy Anegon, F. (2003a), \u201cGenetic algorithms in relevance feedback: a second test and new contributions\u201d, Information Processing and Management, Vol. 39 No. 5, pp. 669\u201087.","DOI":"10.1016\/S0306-4573(02)00044-4"},{"key":"key2022012920052257700_b14","doi-asserted-by":"crossref","unstructured":"Lopez\u2010Pujalte, C., Guerrero Bote, V. and de Moy Anegon, F. (2003b), \u201cOrder\u2010based fitness functions for genetic algorithms applied to relevance feedback\u201d, Journal of the American Society for Information Science and Technology, Vol. 54 No. 2, pp. 152\u201060.","DOI":"10.1002\/asi.10179"},{"key":"key2022012920052257700_b15","unstructured":"MacFarlane, A. (2000), \u201cDistributed inverted files and performance: a study of parallelism and data distribution methods in IR\u201d, PhD thesis, City University, London."},{"key":"key2022012920052257700_b17","doi-asserted-by":"crossref","unstructured":"MacFarlane, A. and Tuson, A. (2008), \u201cLocal search: a guide for the information retrieval practitioner\u201d, Information Processing and Management, Vol. 45 No. 1, pp. 159\u201074.","DOI":"10.1016\/j.ipm.2008.09.002"},{"key":"key2022012920052257700_b16","doi-asserted-by":"crossref","unstructured":"MacFarlane, A., Robertson, S. and McCann, J. (2003), \u201cParallel computing for term selection in routing\/filtering\u201d, in Sebastiani, F. (Ed.), Proceeding of the 25th European Conference on IR Research, ECIR 2003, Pisa, LNCS 2633, Springer\u2010Verlag, Berlin, pp. 537\u201045.","DOI":"10.1007\/3-540-36618-0_40"},{"key":"key2022012920052257700_b19","doi-asserted-by":"crossref","unstructured":"Martin, J. and Shackleton, M. (2003), \u201cInvestigation of the importance of the genotype\u2010phenotype mapping in information retrieval\u201d, Future Generation Computer Systems, Vol. 19 No. 1, pp. 55\u201068.","DOI":"10.1016\/S0167-739X(02)00108-5"},{"key":"key2022012920052257700_b18","doi-asserted-by":"crossref","unstructured":"Martin\u2010Bautista, M.J., Vila, M.A. and Larsen, H.L. (1999), \u201cA fuzzy genetic algorithm approach to an adaptive information retrieval agent\u201d, Journal of the American Society for Information Science, Vol. 50 No. 9, pp. 760\u201071.","DOI":"10.1002\/(SICI)1097-4571(1999)50:9<760::AID-ASI4>3.0.CO;2-O"},{"key":"key2022012920052257700_b20","unstructured":"Mitchell, M. (1999), An Introduction to Genetic Algorithms, 6th ed., MIT Press, Cambridge, MA."},{"key":"key2022012920052257700_b21","doi-asserted-by":"crossref","unstructured":"Robertson, A. and Willett, P. (1996), \u201cAn upperbound to the performance of ranked\u2010output searching: optimal weighting of query terms using a genetic algorithm\u201d, Journal of Documentation, Vol. 52 No. 4, pp. 405\u201020.","DOI":"10.1108\/eb026973"},{"key":"key2022012920052257700_b23","doi-asserted-by":"crossref","unstructured":"Robertson, S. (1990), \u201cOn term selection for query expansion, documentation note\u201d, Journal of Documentation, Vol. 46 No. 4, pp. 359\u201064.","DOI":"10.1108\/eb026866"},{"key":"key2022012920052257700_b24","doi-asserted-by":"crossref","unstructured":"Robertson, S. (1997), \u201cOverview of the Okapi projects: special issue\u201d, Journal of Documentation, Vol. 53 No. 1, pp. 3\u20107.","DOI":"10.1108\/EUM0000000007186"},{"key":"key2022012920052257700_b22","doi-asserted-by":"crossref","unstructured":"Robertson, S. and Sparck\u2010Jones, K. (1976), \u201cRelevance weighting of search terms\u201d, Journal of the American Society Information Science, Vol. 27, pp. 129\u201046.","DOI":"10.1002\/asi.4630270302"},{"key":"key2022012920052257700_b26","doi-asserted-by":"crossref","unstructured":"Robertson, S., Walker, S. and Hancock\u2010 Beaulieu, M. (1995), \u201cLarge test collection experiments on an operational interactive system: Okapi at TREC\u201d, Information Processing and Management, Vol. 31 No. 3, pp. 345\u201060.","DOI":"10.1016\/0306-4573(94)00051-4"},{"key":"key2022012920052257700_b27","unstructured":"Robertson, S., Walker, S., Beaulieu, M., Gatford, M. and Payne, A. (1996), \u201cOkapi at TREC\u20104\u201d, in Harman, D. (Ed.), Proceedings of the Forth Text Retrieval Conference, Gaithersburg, November 1995, NIST SP 500\u2010236, pp. 73\u201096."},{"key":"key2022012920052257700_b25","unstructured":"Robertson, S., Walker, S., Jones, S., Hancock\u2010 Beaulieu, M. and Gatford, M. (1995), \u201cOkapi at TREC\u20103\u201d, in Harman, D. (Ed.), Proceedings of the Third Text Retrieval Conference, Gaithersburg, November 1994, NIST SP 500\u2010226, pp. 109\u201026."},{"key":"key2022012920052257700_b28","unstructured":"Rozsypal, A. and Kubat, M. (2001), \u201cUsing the genetic algorithm to reduce the size of a nearest\u2010neighbour classifier and to select relevant attributes\u201d, paper presented at the 18th International Conference on Machine Learning (ICML 2001), Cambridge, MA."},{"key":"key2022012920052257700_b29","doi-asserted-by":"crossref","unstructured":"Sebastiani, F. (2002), \u201cMachine learning in automated text categorization\u201d, ACM Computing Surveys, Vol. 34 No. 1, pp. 1\u201047.","DOI":"10.1145\/505282.505283"},{"key":"key2022012920052257700_b30","doi-asserted-by":"crossref","unstructured":"Tamine, L., Chrisment, C. and Boughanem, M. (2003), \u201cMultiple query evaluation based on enhanced genetic algorithm\u201d, Information Processing and Management, Vol. 39 No. 2, pp. 21\u2010231.","DOI":"10.1016\/S0306-4573(02)00048-1"},{"key":"key2022012920052257700_b31","unstructured":"Tuson, A. (1998), \u201cOptimisation with hillclimbing on steroids: an overview of neighbourhood search techniques\u201d, paper presented at the 10th Young OR Conference, Operational Research Society, Birmingham."},{"key":"key2022012920052257700_b32","doi-asserted-by":"crossref","unstructured":"Vrajitoru, D. (1998), \u201cCrossover improvement for the genetic algorithm in information retrieval\u201d, Information Processing and Management, Vol. 34 No. 4, pp. 405\u201015.","DOI":"10.1016\/S0306-4573(98)00015-6"},{"key":"key2022012920052257700_b33","unstructured":"Walker, S., Robertson, S. and Boughanem, M. (1998), \u201cOkapi at TREC\u20106: automatic ad hoc, VLC, routing and filtering\u201d, in Voorhees, E. and Harman, D. (Eds), Proceedings of the Fifth Text Retrieval Conference, Gaithersburg, November 1996, NIST SP 500\u2010240, pp. 125\u201036."},{"key":"key2022012920052257700_b34","unstructured":"Yang, J.J. and Korfhage, R. (1994), \u201cQuery modifications using genetic algorithms in vector space models\u201d, International Journal of Expert Systems, Vol. 7 No. 2, pp. 165\u201091."}],"container-title":["Journal of Documentation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/00220411011052939\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/00220411011052939\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:38:14Z","timestamp":1753400294000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jd\/article\/66\/4\/513-531\/205751"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7,27]]},"references-count":34,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,7,27]]}},"alternative-id":["10.1108\/00220411011052939"],"URL":"https:\/\/doi.org\/10.1108\/00220411011052939","relation":{},"ISSN":["0022-0418"],"issn-type":[{"type":"print","value":"0022-0418"}],"subject":[],"published":{"date-parts":[[2010,7,27]]}}}