{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:20:52Z","timestamp":1750306852391,"version":"3.41.0"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2014,6,1]],"date-time":"2014-06-01T00:00:00Z","timestamp":1401580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Graduate Scholarship from Bodossaki Foundation, Athens, Greece"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2014,6]]},"abstract":"<jats:p>Real-time search requires to incrementally ingest content updates and almost immediately make them searchable while serving search queries at low latency. This is currently feasible for datasets of moderate size by fully maintaining the index in the main memory of multiple machines. Instead, disk-based methods for incremental index maintenance substantially increase search latency with the index fragmented across multiple disk locations. For the support of fast search over disk-based storage, we take a fresh look at incremental text indexing in the context of current architectural features. We introduce a greedy method called Selective Range Flush (SRF) to contiguously organize the index over disk blocks and dynamically update it at low cost. We show that SRF requires substantial experimental effort to tune specific parameters for performance efficiency. Subsequently, we propose the Unified Range Flush (URF) method, which is conceptually simpler than SRF, achieves similar or better performance with fewer parameters and less tuning, and is amenable to I\/O complexity analysis. We implement interesting variations of the two methods in the Proteus prototype search engine that we developed and do extensive experiments with three different Web datasets of size up to 1TB. Across different systems, we show that our methods offer search latency that matches or reduces up to half the lowest achieved by existing disk-based methods. In comparison to an existing method of comparable search latency on the same system, our methods reduce by a factor of 2.0--2.4 the I\/O part of build time and by 21--24% the total build time.<\/jats:p>","DOI":"10.1145\/2560800","type":"journal-article","created":{"date-parts":[[2014,7,7]],"date-time":"2014-07-07T11:55:18Z","timestamp":1404734118000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Incremental Text Indexing for Fast Disk-Based Search"],"prefix":"10.1145","volume":"8","author":[{"given":"Giorgos","family":"Margaritis","sequence":"first","affiliation":[{"name":"University of Ioannina, Ioannina, Greece"}]},{"given":"Stergios V.","family":"Anastasiadis","sequence":"additional","affiliation":[{"name":"University of Ioannina, Ioannina, Greece"}]}],"member":"320","published-online":{"date-parts":[[2014,7,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148235"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/383034.383035"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367846"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277775"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2003.1196112"},{"volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies. 67--80","year":"2008","author":"Batsakis Alexandros","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1868366.1868370"},{"key":"e_1_2_1_8_1","unstructured":"Allan Borodin and Ran El-Yaniv. 1998. Online Computation and Competitive Analysis. Cambridge University Press Cambridge UK.   Allan Borodin and Ran El-Yaniv. 1998. Online Computation and Competitive Analysis. Cambridge University Press Cambridge UK."},{"edition":"4","volume-title":"Readings in Database Systems","author":"Brewer Eric A.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956944"},{"volume-title":"Proceedings of the 20th International Conference on Very Large Data Bases. 192--202","author":"Brown Eric W.","key":"e_1_2_1_11_1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.149"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-007-9042-8"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/11735106_21"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148233"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03784-9_19"},{"volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 205--218","author":"Chang Fay","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989391"},{"key":"e_1_2_1_19_1","unstructured":"Tzicker Chiueh and Lan Huang. 1999. Efficient real-time index updates in text retrieval systems. Tech. rep. 66 ECSL Stony Brook University Stony Brook NY.  Tzicker Chiueh and Lan Huang. 1999. Efficient real-time index updates in text retrieval systems. Tech. rep. 66 ECSL Stony Brook University Stony Brook NY."},{"key":"e_1_2_1_20_1","unstructured":"ClueWeb. 2009. The ClueWeb09 dataset. http:\/\/boston.lti.cs.cmu.edu\/Data\/clueweb09\/.  ClueWeb. 2009. The ClueWeb09 dataset. http:\/\/boston.lti.cs.cmu.edu\/Data\/clueweb09\/."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/96749.98245"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2408776.2408794"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2010.73"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321545"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646010"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10268"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(00)00063-3"},{"key":"e_1_2_1_29_1","unstructured":"Raj Jain. 1991. The Art of Computer Systems Performance Analysis. Wiley.  Raj Jain. 1991. The Art of Computer Systems Performance Analysis. Wiley."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465367"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038916.2038943"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321457"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1099554.1099739"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1386118.1386125"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2005.09.005"},{"volume-title":"Proceedings of the Australasian Computer Science Conference. 15--23","year":"2004","author":"Lester Nicholas","key":"e_1_2_1_36_1"},{"volume-title":"Proceedings of the USENIX Conference on File and Storage Technologies. 153--166","author":"Leung Andrew W.","key":"e_1_2_1_37_1"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/775152.775167"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807138"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629575.1629590"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646012"},{"key":"e_1_2_1_42_1","unstructured":"Michael Mccandless Erik Hatcher and Otis Gospodnetic. 2010. Lucene in Action. Manning Publications Stamford CT.  Michael Mccandless Erik Hatcher and Otis Gospodnetic. 2010. Lucene in Action. Manning Publications Stamford CT."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/502115.502116"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277776"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/224056.224064"},{"volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 251--264","year":"2010","author":"Peng Daniel","key":"e_1_2_1_46_1"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168836.2168863"},{"volume-title":"Proceedings of the USENIX Annual Technical Conference. 171--184","author":"Shah Sam","key":"e_1_2_1_49_1"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277774"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/191839.191896"},{"key":"e_1_2_1_52_1","unstructured":"Trec. 2006. TREC terabyte track. National Institute of Standards and Technology. http:\/\/trec.nist.gov\/data\/terabyte.html.  Trec. 2006. TREC terabyte track. National Institute of Standards and Technology. http:\/\/trec.nist.gov\/data\/terabyte.html."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2012.43"},{"key":"e_1_2_1_54_1","unstructured":"Wikipedia. 2008. The wikipedia dataset. http:\/\/static.wikipedia.org\/downloads\/2008-06\/en\/.  Wikipedia. 2008. The wikipedia dataset. http:\/\/static.wikipedia.org\/downloads\/2008-06\/en\/."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028099.1028102"},{"key":"e_1_2_1_56_1","unstructured":"Wumpus. 2011. Wumpus search engine. http:\/\/www.wumpus-search.org.  Wumpus. 2011. Wumpus search engine. http:\/\/www.wumpus-search.org."},{"key":"e_1_2_1_57_1","unstructured":"Zettair. 2009. The Zettair search engine. RMIT university. http:\/\/www.seg.rmit.edu.au\/zettair\/.  Zettair. 2009. The Zettair search engine. RMIT university. http:\/\/www.seg.rmit.edu.au\/zettair\/."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458174"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132956.1132959"},{"volume-title":"Proceedings of the Australian Database Conference. 26--38","year":"1993","author":"Zobel Justin","key":"e_1_2_1_60_1"},{"key":"e_1_2_1_61_1","unstructured":"Zoie. 2008. Zoie real-time search and indexing system built on Apache Lucene. http:\/\/code.google.com\/p\/zoie\/wiki\/ZoieMergePolicy.  Zoie. 2008. Zoie real-time search and indexing system built on Apache Lucene. http:\/\/code.google.com\/p\/zoie\/wiki\/ZoieMergePolicy."}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2560800","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2560800","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:10:21Z","timestamp":1750234221000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2560800"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6]]},"references-count":61,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,6]]}},"alternative-id":["10.1145\/2560800"],"URL":"https:\/\/doi.org\/10.1145\/2560800","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"type":"print","value":"1559-1131"},{"type":"electronic","value":"1559-114X"}],"subject":[],"published":{"date-parts":[[2014,6]]},"assertion":[{"value":"2012-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-07-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}