{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:35:44Z","timestamp":1750307744940,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2008,3,1]],"date-time":"2008-03-01T00:00:00Z","timestamp":1204329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["POSC\/EIA\/58194\/2004"],"award-info":[{"award-number":["POSC\/EIA\/58194\/2004"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","award":["552.087\/02-5","303576\/2004-9303032\/2004-9","55.3126\/2005-9"],"award-info":[{"award-number":["552.087\/02-5","303576\/2004-9303032\/2004-9","55.3126\/2005-9"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2008,3]]},"abstract":"<jats:p>This article discusses a novel approach developed for static index pruning that takes into account the locality of occurrences of words in the text. We use this new approach to propose and experiment on simple and effective pruning methods that allow a fast construction of the pruned index. The methods proposed here are especially useful for pruning in environments where the document database changes continuously, such as large-scale web search engines. Extensive experiments are presented showing that the proposed methods can achieve high compression rates while maintaining the quality of results for the most common query types present in modern search engines, namely, conjunctive and phrase queries. In the experiments, our locality-based pruning approach allowed reducing search engine indices to 30% of their original size, with almost no reduction in precision at the top answers. Furthermore, we conclude that even an extremely simple locality-based pruning method can be competitive when compared to complex methods that do not rely on locality information.<\/jats:p>","DOI":"10.1145\/1344411.1344415","type":"journal-article","created":{"date-parts":[[2008,4,8]],"date-time":"2008-04-08T15:40:00Z","timestamp":1207669200000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Locality-Based pruning methods for web search"],"prefix":"10.1145","volume":"26","author":[{"given":"Edleno Silva de","family":"Moura","sequence":"first","affiliation":[{"name":"Federal University of Amazonas, Brazil"}]},{"given":"Celia Francisca dos","family":"Santos","sequence":"additional","affiliation":[{"name":"Federal University of Amazonas, Brazil"}]},{"given":"Bruno Dos santos de","family":"Araujo","sequence":"additional","affiliation":[{"name":"Federal University of Amazonas, Brazil"}]},{"given":"Altigran Soares da","family":"Silva","sequence":"additional","affiliation":[{"name":"Federal University of Amazonas, Brazil"}]},{"given":"Pavel","family":"Calado","sequence":"additional","affiliation":[{"name":"IST\/Inesc-ID, Porto Salvo, Portugal"}]},{"given":"Mario A.","family":"Nascimento","sequence":"additional","affiliation":[{"name":"University of Alberta, Edmonton AB, Canada"}]}],"member":"320","published-online":{"date-parts":[[2008,4,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/502115.502119"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Anderson T. W. and Finn J. D. 1997. The New Statistical Analysis of Data 1st ed. Springer.  Anderson T. W. and Finn J. D. 1997. The New Statistical Analysis of Data 1st ed. Springer.","DOI":"10.1007\/978-1-4612-2262-0"},{"volume-title":"Modern Information Retrieval","author":"Baeza-Yates R.","key":"e_1_2_1_3_1","unstructured":"Baeza-Yates , R. and Ribeiro-Neto , B. 1999. Modern Information Retrieval . Addison-Wesley . Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564415"},{"key":"e_1_2_1_5_1","unstructured":"Bell T. C. Cleary J. G. and Witten I. H. 1990. Text Compression. Prentice Hall.   Bell T. C. Cleary J. G. and Witten I. H. 1990. Text Compression. Prentice Hall."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(95)00052-I"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/792550.792552"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/635484.635486"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188589"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/383952.383958"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/958942.958945"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060783"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/348751.348754"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/321510.321519"},{"volume-title":"Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 28--36","author":"Fagin R.","key":"e_1_2_1_16_1","unstructured":"Fagin , R. , Kumar , R. , and Sivakumar , D . 2003. Comparing top k lists . In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 28--36 . Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 28--36."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(99)00024-9"},{"volume-title":"Proceedings of the Text Retrieval Conference (TREC-8)","author":"Hawking D.","key":"e_1_2_1_18_1","unstructured":"Hawking , D. , Voorhees , E. , Bailey , P. , and Craswell , N . 1999. Overview of Trec-8 Web track . In Proceedings of the Text Retrieval Conference (TREC-8) . Gaithersburg, MD, 131--150. Hawking, D., Voorhees, E., Bailey, P., and Craswell, N. 1999. Overview of Trec-8 Web track. In Proceedings of the Text Retrieval Conference (TREC-8). Gaithersburg, MD, 131--150."},{"volume-title":"Proceedings of the Text Retrieval Conference (TREC-7)","author":"Hawking D.","key":"e_1_2_1_19_1","unstructured":"Hawking , D. , Craswell , N. , and Thistlewaite , P. B . 1998. Overview of TREC-7 very large collection track . In Proceedings of the Text Retrieval Conference (TREC-7) , Gaithersburg, MD, 91--104. Hawking, D., Craswell, N., and Thistlewaite, P. B. 1998. Overview of TREC-7 very large collection track. In Proceedings of the Text Retrieval Conference (TREC-7), Gaithersburg, MD, 91--104."},{"key":"e_1_2_1_20_1","volume-title":"-Y","author":"Hovy E. H.","year":"1998","unstructured":"Hovy , E. H. and Lin , C . -Y . 1998 . Automated text summarization in SUMMARIST. In Advances in Automated Text Summarization, I. Mani and M. Maybury, eds. MIT Press , 81--94. Hovy, E. H. and Lin, C.-Y. 1998. Automated text summarization in SUMMARIST. In Advances in Automated Text Summarization, I. Mani and M. Maybury, eds. MIT Press, 81--94."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)","author":"Kleinberg J. M.","year":"1998","unstructured":"Kleinberg , J. M. 1998 . Authoritative sources in a hyperlinked environment . In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , San Francisco, CA, 668--677. Kleinberg, J. M. 1998. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), San Francisco, CA, 668--677."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/584792.584854"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC)","volume":"2","author":"Mallett D.","unstructured":"Mallett , D. , Elding , J. , and Nascimento , M. A . 2004. Information-Content based sentence extraction for text summarization . In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC) , vol. 2 . IEEE Computer Society, 214. Mallett, D., Elding, J., and Nascimento, M. A. 2004. Information-Content based sentence extraction for text summarization. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC), vol. 2. IEEE Computer Society, 214."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/502115.502116"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009934302807"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/383952.383956"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199610)47:10%3C749::AID-ASI3%3E3.3.CO;2-U"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988675"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1035134.1035161"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/383952.383987"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/160688.160693"},{"key":"e_1_2_1_32_1","volume-title":"Introduction to Modern Information Retrieval","author":"Salton G.","unstructured":"Salton , G. and McGill , M. J. 1983. Introduction to Modern Information Retrieval , 1 st ed. McGraw-Hill . Salton, G. and McGill, M. J. 1983. Introduction to Modern Information Retrieval, 1st ed. McGraw-Hill.","edition":"1"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1076034.1076064"},{"key":"e_1_2_1_34_1","volume-title":"Tech. Rep. 14","author":"Silverstein C.","year":"1998","unstructured":"Silverstein , C. , Henzinger , M. , Marais , H. , and Moricz , M . 1998 . Analysis of a very large Altavista query log. Tech. Rep. 14 , Systems Research Center Laboratory . October. Silverstein, C., Henzinger, M., Marais, H., and Moricz, M. 1998. Analysis of a very large Altavista query log. Tech. Rep. 14, Systems Research Center Laboratory. October."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/331403.331405"},{"key":"e_1_2_1_36_1","volume-title":"Managing Gigabytes: Compressing and Indexing Documents and Images","author":"Witten I. H.","year":"1999","unstructured":"Witten , I. H. , Moffat , A. , and Bell , T. C . 1999 . Managing Gigabytes: Compressing and Indexing Documents and Images , 2 nd ed. Morgan Kaufmann . Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd ed. Morgan Kaufmann.","edition":"2"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1344411.1344415","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1344411.1344415","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:39:04Z","timestamp":1750253944000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1344411.1344415"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,3]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2008,3]]}},"alternative-id":["10.1145\/1344411.1344415"],"URL":"https:\/\/doi.org\/10.1145\/1344411.1344415","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2008,3]]},"assertion":[{"value":"2005-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-04-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}