{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T00:29:15Z","timestamp":1761611355637,"version":"3.41.0"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2014,4,1]],"date-time":"2014-04-01T00:00:00Z","timestamp":1396310400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["CCF-1017623\/CCF-1218904"],"award-info":[{"award-number":["CCF-1017623\/CCF-1218904"]}],"id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001868","name":"National Science Council Taiwan","doi-asserted-by":"publisher","award":["99-2221-E-007-123 and 102-2221-E-007-068"],"award-info":[{"award-number":["99-2221-E-007-123 and 102-2221-E-007-068"]}],"id":[{"id":"10.13039\/501100001868","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2014,4]]},"abstract":"<jats:p>\n            The inverted index is the backbone of modern web search engines. For each word in a collection of web documents, the index records the list of documents where this word occurs. Given a set of query words, the job of a search engine is to output a ranked list of the most relevant documents containing the query. However, if the query consists of an arbitrary string\u2014which can be a partial word, multiword phrase, or more generally any sequence of characters\u2014then word boundaries are no longer relevant and we need a different approach. In string retrieval settings, we are given a set\n            <jats:italic>D<\/jats:italic>\n            ={\n            <jats:italic>d<\/jats:italic>\n            <jats:sub>1<\/jats:sub>\n            ,\n            <jats:italic>d<\/jats:italic>\n            <jats:sub>2<\/jats:sub>\n            ,\n            <jats:italic>d<\/jats:italic>\n            <jats:sub>3<\/jats:sub>\n            , \u2026,\n            <jats:italic>d<\/jats:italic>\n            <jats:sub>D<\/jats:sub>\n            } of\n            <jats:italic>D<\/jats:italic>\n            strings with\n            <jats:italic>n<\/jats:italic>\n            characters in total taken from an alphabet set \u03a3 = [\u03c3], and the task of the search engine, for a given query pattern\n            <jats:italic>P<\/jats:italic>\n            of length\n            <jats:italic>p<\/jats:italic>\n            , is to report the \u201cmost relevant\u201d strings in\n            <jats:italic>D<\/jats:italic>\n            containing\n            <jats:italic>P<\/jats:italic>\n            . The query may also consist of two or more patterns. The notion of relevance can be captured by a function\n            <jats:italic>score<\/jats:italic>\n            (\n            <jats:italic>P<\/jats:italic>\n            ,\n            <jats:italic>d<\/jats:italic>\n            <jats:sub>\n              <jats:italic>r<\/jats:italic>\n            <\/jats:sub>\n            ), which indicates how relevant document\n            <jats:italic>d<\/jats:italic>\n            <jats:sub>\n              <jats:italic>r<\/jats:italic>\n            <\/jats:sub>\n            is to the pattern\n            <jats:italic>P<\/jats:italic>\n            . Some example score functions are the\n            <jats:italic>frequency<\/jats:italic>\n            of pattern occurrences,\n            <jats:italic>proximity<\/jats:italic>\n            between pattern occurrences, or pattern-independent\n            <jats:italic>PageRank<\/jats:italic>\n            of the document.\n          <\/jats:p>\n          <jats:p>\n            The first formal framework to study such kinds of retrieval problems was given by Muthukrishnan [SODA 2002]. He considered two metrics for relevance: frequency and proximity. He took a threshold-based approach on these metrics and gave data structures that use\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            log\n            <jats:italic>n<\/jats:italic>\n            ) words of space. We study this problem in a somewhat more natural top-\n            <jats:italic>k<\/jats:italic>\n            framework. Here,\n            <jats:italic>k<\/jats:italic>\n            is a part of the query, and the top\n            <jats:italic>k<\/jats:italic>\n            most relevant (highest-scoring) documents are to be reported in sorted order of score. We present the first linear-space framework (i.e., using\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            ) words of space) that is capable of handling arbitrary score functions with near-optimal\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>p<\/jats:italic>\n            +\n            <jats:italic>k<\/jats:italic>\n            log\n            <jats:italic>k<\/jats:italic>\n            ) query time. The query time can be made optimal\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>p<\/jats:italic>\n            +\n            <jats:italic>k<\/jats:italic>\n            ) if sorted order is not necessary. Further, we derive compact space and succinct space indexes (for some specific score functions). This space compression comes at the cost of higher query time. At last, we extend our framework to handle the case of multiple patterns. Apart from providing a robust framework, our results also improve many earlier results in index space or query time or both.\n          <\/jats:p>","DOI":"10.1145\/2590774","type":"journal-article","created":{"date-parts":[[2014,4,22]],"date-time":"2014-04-22T13:37:45Z","timestamp":1398173865000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Space-Efficient Frameworks for Top-\n            <i>k<\/i>\n            String Retrieval"],"prefix":"10.1145","volume":"61","author":[{"given":"Wing-Kai","family":"Hon","sequence":"first","affiliation":[{"name":"National Tsing Hua University, Taiwan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rahul","family":"Shah","sequence":"additional","affiliation":[{"name":"Louisiana State University, Baton Rouge, LA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sharma V.","family":"Thankachan","sequence":"additional","affiliation":[{"name":"Louisiana State University, Baton Rouge, LA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey Scott","family":"Vitter","sequence":"additional","affiliation":[{"name":"The University of Kansas, Lawrence, KS"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2014,4,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/48529.48535"},{"volume-title":"Proceedings of the International Symposium on String Processing and Information Retrieval. 386--397","author":"Belazzougui D.","key":"e_1_2_1_2_1","unstructured":"D. Belazzougui and G. Navarro . 2011. Improved compressed indexes for full-text document retrieval . In Proceedings of the International Symposium on String Processing and Information Retrieval. 386--397 . D. Belazzougui and G. Navarro. 2011. Improved compressed indexes for full-text document retrieval. In Proceedings of the International Symposium on String Processing and Information Retrieval. 386--397."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2012.07.005"},{"volume-title":"Proceedings of the Latin American Symposium on Theoretical Informatics. 88--94","author":"Bender M. A.","key":"e_1_2_1_4_1","unstructured":"M. A. Bender and M. Farach-Colton . 2000. The LCA problem revisited . In Proceedings of the Latin American Symposium on Theoretical Informatics. 88--94 . M. A. Bender and M. Farach-Colton. 2000. The LCA problem revisited. In Proceedings of the Latin American Symposium on Theoretical Informatics. 88--94."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(73)80033-9"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/322123.322127"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/0217026"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-013-9792-1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2010.06.002"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007374"},{"volume-title":"Proceedings of the European Symposium on Algorithms. 194--205","author":"Culpepper J. S.","key":"e_1_2_1_12_1","unstructured":"J. S. Culpepper , G. Navarro , S. J. Puglisi , and A. Turpin . 2010. Top-k ranked document search in general text databases . In Proceedings of the European Symposium on Algorithms. 194--205 . J. S. Culpepper, G. Navarro, S. J. Puglisi, and A. Turpin. 2010. Top-k ranked document search in general text databases. In Proceedings of the European Symposium on Algorithms. 194--205."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348317"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/795663.796326"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082036.1082039"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240233.1240243"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29344-3_28"},{"volume-title":"Proceedings of the Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. 459--470","author":"Fischer J.","key":"e_1_2_1_18_1","unstructured":"J. Fischer and V. Heun . 2007. A new succinct representation of RMQ-information and improvements in the enhanced suffix array . In Proceedings of the Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. 459--470 . J. Fischer and V. Heun. 2007. A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In Proceedings of the Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. 459--470."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/090779759"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1006\/inco.1993.1030"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/828.1884"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/795665.796479"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38905-4_12"},{"volume-title":"Proceedings of the International Symposium on String Processing and Information Retrieval. 67--81","author":"Gagie T.","key":"e_1_2_1_24_1","unstructured":"T. Gagie , G. Navarro , and S. J. Puglisi . 2010. Colored range queries and document retrieval . In Proceedings of the International Symposium on String Processing and Information Retrieval. 67--81 . T. Gagie, G. Navarro, and S. J. Puglisi. 2010. Colored range queries and document retrieval. In Proceedings of the International Symposium on String Processing and Information Retrieval. 67--81."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2011.12.002"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03784-9_1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/1109557.1109599"},{"volume-title":"Proceedings of the Symposium on Discrete Algorithms. 841--850","author":"Grossi R.","key":"e_1_2_1_28_1","unstructured":"R. Grossi , A. Gupta , and J. S. Vitter . 2003. High-order entropy-compressed text indexes . In Proceedings of the Symposium on Discrete Algorithms. 841--850 . R. Grossi, A. Gupta, and J. S. Vitter. 2003. High-order entropy-compressed text indexes. In Proceedings of the Symposium on Discrete Algorithms. 841--850."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539702402354"},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Wing-Kai Hon Manish Patil Rahul Shah Sharma V. Thankachan and Jeffrey Scott Vitter. 2013a. Indexes for document retrieval with relevance. In Space-Efficient Data Structures Streams and Algorithms. 351--362.  Wing-Kai Hon Manish Patil Rahul Shah Sharma V. Thankachan and Jeffrey Scott Vitter. 2013a. Indexes for document retrieval with relevance. In Space-Efficient Data Structures Streams and Algorithms. 351--362.","DOI":"10.1007\/978-3-642-40273-9_22"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2010.08.003"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31265-6_14"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/1928328.1928336"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2012.09.002"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31265-6_15"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2009.19"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2013.42"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488440"},{"volume-title":"Proceedings of the Symposium on Discrete Algorithms. 401--411","author":"Karpinski M.","key":"e_1_2_1_39_1","unstructured":"M. Karpinski and Y. Nekrich . 2011. Top-K color queries for document retrieval . In Proceedings of the Symposium on Discrete Algorithms. 401--411 . M. Karpinski and Y. Nekrich. 2011. Top-K color queries for document retrieval. In Proceedings of the Symposium on Discrete Algorithms. 401--411."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1137\/0206024"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2013.43"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1137\/0222058"},{"volume-title":"Proceedings of the European Symposium on Algorithms. 67--78","author":"Matias Y.","key":"e_1_2_1_43_1","unstructured":"Y. Matias , S. Muthukrishnan , S. C. Sahinalp , and J. Ziv . 1998. Augmenting suffix trees, with applications . In Proceedings of the European Symposium on Algorithms. 67--78 . Y. Matias, S. Muthukrishnan, S. C. Sahinalp, and J. Ziv. 1998. Augmenting suffix trees, with applications. In Proceedings of the European Symposium on Algorithms. 67--78."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/321941.321946"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.2000.1151"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the Symposium on Discrete Algorithms. 657--666","author":"Muthukrishnan S.","year":"2002","unstructured":"S. Muthukrishnan . 2002 . Efficient algorithms for document retrieval problems . In Proceedings of the Symposium on Discrete Algorithms. 657--666 . S. Muthukrishnan. 2002. Efficient algorithms for document retrieval problems. In Proceedings of the Symposium on Discrete Algorithms. 657--666."},{"key":"e_1_2_1_47_1","unstructured":"Gonzalo Navarro. 2013. Spaces trees and colors: The algorithmic landscape of document retrieval on sequences. CoRR abs\/1304.6023.  Gonzalo Navarro. 2013. Spaces trees and colors: The algorithmic landscape of document retrieval on sequences. CoRR abs\/1304.6023."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1216370.1216372"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31155-0_24"},{"volume-title":"Proceedings of the Symposium on Discrete Algorithms. 1066--1077","author":"Navarro G.","key":"e_1_2_1_50_1","unstructured":"G. Navarro and Y. Nekrich . 2012b. Top-k document retrieval in optimal time and linear space . In Proceedings of the Symposium on Discrete Algorithms. 1066--1077 . G. Navarro and Y. Nekrich. 2012b. Top-k document retrieval in optimal time and linear space. In Proceedings of the Symposium on Discrete Algorithms. 1066--1077."},{"volume-title":"Proceedings of the Symposium on Experimental Algorithms. 193--205","author":"Navarro G.","key":"e_1_2_1_51_1","unstructured":"G. Navarro , S. J. Puglisi , and D. Valenzuela . 2011. Practical compressed document retrieval . In Proceedings of the Symposium on Experimental Algorithms. 193--205 . G. Navarro, S. J. Puglisi, and D. Valenzuela. 2011. Practical compressed document retrieval. In Proceedings of the Symposium on Experimental Algorithms. 193--205."},{"volume-title":"Proceedings of the International Symposium on String Processing and Information Retrieval, To appear.","author":"Navarro Gonzalo","key":"e_1_2_1_52_1","unstructured":"Gonzalo Navarro and Sharma V. Thankachan . 2013. Faster top-k document retrieval in optimal space . In Proceedings of the International Symposium on String Processing and Information Retrieval, To appear. Gonzalo Navarro and Sharma V. Thankachan. 2013. Faster top-k document retrieval in optimal space. In Proceedings of the International Symposium on String Processing and Information Retrieval, To appear."},{"key":"e_1_2_1_53_1","volume-title":"Technical Report 1999-66, Stanford InfoLab.","author":"Page L.","year":"1999","unstructured":"L. Page , S. Brin , R. Motwani , and T. Winograd . 1999 . The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab. L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539700369909"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009992"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2008.83"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1290672.1290680"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-006-1198-x"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2006.03.011"},{"volume-title":"Proceedings of the Symposium on Discrete Algorithms. 134--149","author":"Sadakane K.","key":"e_1_2_1_60_1","unstructured":"K. Sadakane and G. Navarro . 2010. Fully-functional succinct trees . In Proceedings of the Symposium on Discrete Algorithms. 134--149 . K. Sadakane and G. Navarro. 2010. Fully-functional succinct trees. In Proceedings of the Symposium on Discrete Algorithms. 134--149."},{"volume-title":"Proceedings of the Symposium on Discrete Algorithms. 108--115","author":"Shah R.","key":"e_1_2_1_61_1","unstructured":"R. Shah and M. Farach-Colton . 2002. Undiscretized dynamic programming: Faster algorithms for facility location and related problems on trees . In Proceedings of the Symposium on Discrete Algorithms. 108--115 . R. Shah and M. Farach-Colton. 2002. Undiscretized dynamic programming: Faster algorithms for facility location and related problems on trees. In Proceedings of the Symposium on Discrete Algorithms. 108--115."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40450-4_68"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2013.03.012"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-73437-6_22"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1561\/0400000014"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/SWAT.1973.13"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(83)90075-3"},{"key":"e_1_2_1_68_1","volume-title":"Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan-Kaufmann","author":"Witten I.","year":"1999","unstructured":"I. Witten , A. Moffat , and T. Bell . 1999 . Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan-Kaufmann , San Francisco, CA . I. Witten, A. Moffat, and T. Bell. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan-Kaufmann, San Francisco, CA."}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2590774","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2590774","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:10:24Z","timestamp":1750234224000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2590774"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,4]]},"references-count":67,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,4]]}},"alternative-id":["10.1145\/2590774"],"URL":"https:\/\/doi.org\/10.1145\/2590774","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"type":"print","value":"0004-5411"},{"type":"electronic","value":"1557-735X"}],"subject":[],"published":{"date-parts":[[2014,4]]},"assertion":[{"value":"2012-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-04-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}