{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:40:04Z","timestamp":1750282804172,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2001,7,1]],"date-time":"2001-07-01T00:00:00Z","timestamp":993945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2001,7]]},"abstract":"<jats:p>We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.<\/jats:p>","DOI":"10.1145\/502115.502116","type":"journal-article","created":{"date-parts":[[2002,7,27]],"date-time":"2002-07-27T11:29:00Z","timestamp":1027769340000},"page":"217-241","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":56,"title":["Building a distributed full-text index for the web"],"prefix":"10.1145","volume":"19","author":[{"given":"Sergey","family":"Melink","sequence":"first","affiliation":[{"name":"Stanford University, Computer Science Dept. Stanford, CA"}]},{"given":"Sriram","family":"Raghavan","sequence":"additional","affiliation":[{"name":"Stanford University, Computer Science Dept. Stanford, CA"}]},{"given":"Beverly","family":"Yang","sequence":"additional","affiliation":[{"name":"Stanford University, Computer Science Dept. Stanford, CA"}]},{"given":"Hector","family":"Garcia-Molina","sequence":"additional","affiliation":[{"name":"Stanford University, Computer Science Dept. Stanford, CA"}]}],"member":"320","published-online":{"date-parts":[[2001,7]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 21st International Conference on Research and Development in Information Retrieval (August), 290-297","author":"ANH V.N.","year":"1998","unstructured":"ANH , V.N. AND MOFFAT , A. 1998 . Compressed inverted files with reduced decoding overheads . In Proceedings of the 21st International Conference on Research and Development in Information Retrieval (August), 290-297 . 10.1145\/290941.291011 ANH,V.N.AND MOFFAT, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st International Conference on Research and Development in Information Retrieval (August), 290-297. 10.1145\/290941.291011"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90101-X"},{"key":"e_1_2_1_3_1","first-page":"30","volume-title":"Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR), ACM Press","author":"BROWN E. W.","year":"1995","unstructured":"BROWN , E. W. 1995 . Fast evaluation of structured queries for information retrieval . In Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR), ACM Press , New York, NY , 30 - 38 . 10.1145\/215206.215329 BROWN, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR), ACM Press, New York, NY, 30-38. 10.1145\/215206.215329"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 20st International Conference on Very Large Databases (September), 192-202","author":"BROWN E.W.","year":"1994","unstructured":"BROWN , E.W. , CALLAN , J.P. , AND CROFT , W. B. 1994 . Fast incremental indexing for full-text information retrieval . In Proceedings of the 20st International Conference on Very Large Databases (September), 192-202 . BROWN,E.W.,CALLAN,J.P.,AND CROFT, W. B. 1994. Fast incremental indexing for full-text information retrieval. In Proceedings of the 20st International Conference on Very Large Databases (September), 192-202."},{"key":"e_1_2_1_5_1","volume-title":"4th International Conference on Extending Database Technology (March), 365-378","author":"BROWN E.W.","year":"1994","unstructured":"BROWN , E.W. , CALLAN , J.P. , CROFT , W.B. , AND MOSS , J. E. B. 1994 . Supporting full-text information retrieval with a persistent object store . In 4th International Conference on Extending Database Technology (March), 365-378 . BROWN,E.W.,CALLAN,J.P.,CROFT,W.B.,AND MOSS, J. E. B. 1994. Supporting full-text information retrieval with a persistent object store. In 4th International Conference on Extending Database Technology (March), 365-378."},{"key":"e_1_2_1_6_1","unstructured":"CCITT. 1988. Recommendation X.209 Specification of Basic Encoding Rules for Abstract Syntax Notation one (ASN. 1). CCITT. 1988. Recommendation X.209 Specification of Basic Encoding Rules for Abstract Syntax Notation one (ASN. 1)."},{"key":"e_1_2_1_7_1","first-page":"329","volume-title":"8th ACM Symposium on Parallel Algorithms and Architectures (June), ACM Press","author":"CHAKRABARTI S.","year":"1996","unstructured":"CHAKRABARTI , S. AND MUTHUKRISHNAN , S. 1996 . Resource scheduling for parallel database and scientific applications . In 8th ACM Symposium on Parallel Algorithms and Architectures (June), ACM Press , New York, NY , 329 - 335 . 10.1145\/237502.237577 CHAKRABARTI,S.AND MUTHUKRISHNAN, S. 1996. Resource scheduling for parallel database and scientific applications. In 8th ACM Symposium on Parallel Algorithms and Architectures (June), ACM Press, New York, NY, 329-335. 10.1145\/237502.237577"},{"volume-title":"The evolution of the web and implications for an incremental crawler. To appear in the 26th International Conference on Very Large Databases","author":"CHO J.","key":"e_1_2_1_8_1","unstructured":"CHO , J. AND GARCIA-MOLINA , H. 2000. The evolution of the web and implications for an incremental crawler. To appear in the 26th International Conference on Very Large Databases . CHO,J.AND GARCIA-MOLINA, H. 2000. The evolution of the web and implications for an incremental crawler. To appear in the 26th International Conference on Very Large Databases."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 10th Australasian Database Conference (January).","author":"CRASWELL N.","year":"1999","unstructured":"CRASWELL , N. , HAWKING , D. , AND THISTLEWALTE , P. 1999 . Merging results from isolated search engines . In Proceedings of the 10th Australasian Database Conference (January). CRASWELL, N., HAWKING,D.,AND THISTLEWALTE, P. 1999. Merging results from isolated search engines. In Proceedings of the 10th Australasian Database Conference (January)."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 18th International Conference on Distributed Computing Systems.","author":"DE KRETSER O.","year":"1998","unstructured":"DE KRETSER , O. , MOFFAT , A. , SHIMMMIN , T. , AND ZOBEL , J. 1998 . Methodologies for distributed information retrieval . In Proceedings of the 18th International Conference on Distributed Computing Systems. DE KRETSER, O., MOFFAT, A., SHIMMMIN,T.,AND ZOBEL, J. 1998. Methodologies for distributed information retrieval. In Proceedings of the 18th International Conference on Distributed Computing Systems."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2275.357411"},{"volume-title":"Database System Implementation","author":"GARCIA-MOLINA H.","key":"e_1_2_1_12_1","unstructured":"GARCIA-MOLINA , H. , ULLMAN , J. , AND WIDOM , J. 2000. Database System Implementation . Prentice-Hall , Eaglewood Cliffs, NJ . GARCIA-MOLINA, H., ULLMAN,J.,AND WIDOM, J. 2000. Database System Implementation. Prentice-Hall, Eaglewood Cliffs, NJ."},{"key":"e_1_2_1_13_1","volume-title":"Proceedigns of the 3rd International Conference on Database and Expert System Applications (September), 72-77","author":"GORSSMAN D.A.","year":"1992","unstructured":"GORSSMAN , D.A. AND DRISCOLL , J. R. 1992 . Structuring text within a relation system . In Proceedigns of the 3rd International Conference on Database and Expert System Applications (September), 72-77 . GORSSMAN,D.A.AND DRISCOLL, J. R. 1992. Structuring text within a relation system. In Proceedigns of the 3rd International Conference on Database and Expert System Applications (September), 72-77."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"GRAVANO L. CHANG K. GARCIA-MOLINA H. LAGOZE C. AND PAEPCKE A. 1997. STARTS-stanford protocol for internet retrieval and search. http:\/\/www-db.stanford.edu\/ gravano\/starts.html. GRAVANO L. CHANG K. GARCIA-MOLINA H. LAGOZE C. AND PAEPCKE A. 1997. STARTS-stanford protocol for internet retrieval and search. http:\/\/www-db.stanford.edu\/ gravano\/starts.html.","DOI":"10.1145\/253260.253299"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the Seventh Text Retrieval Conference (November), 91-104","author":"HAWKING D.","year":"1998","unstructured":"HAWKING , D. AND CRASWELL , N. 1998 . Overview of TREC-7 very large collection track . In Proceedings of the Seventh Text Retrieval Conference (November), 91-104 . HAWKING,D.AND CRASWELL, N. 1998. Overview of TREC-7 very large collection track. In Proceedings of the Seventh Text Retrieval Conference (November), 91-104."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 9th International World Wide Web Conference (May), 277-293","author":"HIRAI J.","year":"2000","unstructured":"HIRAI , J. , GARCIA-MOLINA , H. , AND PAEPCKE , A. , RAGHAVAN , S. 2000 . WebBase: A repository of web pages . In Proceedings of the 9th International World Wide Web Conference (May), 277-293 . HIRAI, J., GARCIA-MOLINA, H., AND PAEPCKE, A., RAGHAVAN, S. 2000. WebBase: A repository of web pages. In Proceedings of the 9th International World Wide Web Conference (May), 277-293."},{"key":"e_1_2_1_17_1","unstructured":"INKTOMI. 2000. Inktomi WebMap. http:\/\/www.inktomi.com\/webmap\/. INKTOMI. 2000. Inktomi WebMap. http:\/\/www.inktomi.com\/webmap\/."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.342125"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 7th International World Wide Web Conference.","author":"LAWRENCE S.","year":"1998","unstructured":"LAWRENCE , S. AND GILES , C. L. 1998 . Inquirus, the NECI meta search engine . In Proceedings of the 7th International World Wide Web Conference. LAWRENCE,S.AND GILES, C. L. 1998. Inquirus, the NECI meta search engine. In Proceedings of the 7th International World Wide Web Conference."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1038\/21987","article-title":"Accessibility of information on the web","volume":"400","author":"LAWRENCE S.","year":"1999","unstructured":"LAWRENCE , S. AND GILES , C. L. 1999 . Accessibility of information on the web . Nature 400 , 107 - 109 . LAWRENCE,S.AND GILES, C. L. 1999. Accessibility of information on the web. Nature 400, 107-109.","journal-title":"Nature"},{"key":"e_1_2_1_21_1","first-page":"319","volume-title":"Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms, ACM Press","author":"MANBER U.","year":"1990","unstructured":"MANBER , U. AND MYERS , G. 1990 . Suffix arrays: A new method for on-line string searches . In Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms, ACM Press , New York, NY , 319 - 327 . MANBER,U.AND MYERS, G. 1990. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms, ACM Press, New York, NY, 319-327."},{"key":"e_1_2_1_22_1","first-page":"131","volume-title":"Proceedings of the ACM Conference on Research and Development in Information Retrieval (September), ACM Press","author":"MARTIN P.","year":"1986","unstructured":"MARTIN , P. , MACLEOD , I. A. , AND NORDIN , B. 1986 . Adesign of a distributed full text retrieval system . In Proceedings of the ACM Conference on Research and Development in Information Retrieval (September), ACM Press , New York, NY , 131 - 137 . 10.1145\/253168.253197 MARTIN, P., MACLEOD, I. A., AND NORDIN, B. 1986. Adesign of a distributed full text retrieval system. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (September), ACM Press, New York, NY, 131-137. 10.1145\/253168.253197"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199508)46:7%3C537::AID-ASI7%3E3.0.CO;2-P"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/237496.237497"},{"key":"e_1_2_1_26_1","volume-title":"Berkeley DB. In Proceedings of the 1999 Summer Usenix Technical Conference (June).","author":"OLSON M.","year":"1999","unstructured":"OLSON , M. , BOSTIC , K. , AND SELTZER , M. 1999 . Berkeley DB. In Proceedings of the 1999 Summer Usenix Technical Conference (June). OLSON, M., BOSTIC, K., AND SELTZER, M. 1999. Berkeley DB. In Proceedings of the 1999 Summer Usenix Technical Conference (June)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1145\/276675.276695","volume-title":"Proceedings of the 3rd ACM Conference on Digital Libraries (June), ACM Press","author":"RIBEIRO-NETO B.","year":"1998","unstructured":"RIBEIRO-NETO , B. AND BARBOSA , R. 1998 . Query performance for tightly coupled distributed digital libraries . In Proceedings of the 3rd ACM Conference on Digital Libraries (June), ACM Press , New York, NY , 182 - 190 . 10.1145\/276675.276695 RIBEIRO-NETO,B.AND BARBOSA, R. 1998. Query performance for tightly coupled distributed digital libraries. In Proceedings of the 3rd ACM Conference on Digital Libraries (June), ACM Press, New York, NY, 182-190. 10.1145\/276675.276695"},{"key":"e_1_2_1_28_1","first-page":"105","volume-title":"Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (August), ACM Press","author":"RIBEIRO-NETO B.","year":"1999","unstructured":"RIBEIRO-NETO , B. , MOURA , E.S. , NEUBERT , M.S. , AND ZIVIANI , N. 1999 . Efficient distributed algorithms to build inverted files . In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (August), ACM Press , New York, NY , 105 - 112 . 10.1145\/312624.312663 RIBEIRO-NETO, B., MOURA,E.S.,NEUBERT,M.S.,AND ZIVIANI, N. 1999. Efficient distributed algorithms to build inverted files. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (August), ACM Press, New York, NY, 105-112. 10.1145\/312624.312663"},{"key":"e_1_2_1_29_1","volume-title":"Information Retrieval: Data Structures and Algorithms","author":"SALTON G.","year":"1989","unstructured":"SALTON , G. 1989 . Information Retrieval: Data Structures and Algorithms . Addison-Wesley , Reading, Massachussetts . SALTON, G. 1989. Information Retrieval: Data Structures and Algorithms. Addison-Wesley, Reading, Massachussetts."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems (January), 8-17","author":"TOMASIC A.","year":"1993","unstructured":"TOMASIC , A. AND GARCIA-MOLINA , H. 1993 a. Performance of inverted indices in shared-nothing distributed text document information retrieval systems . In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems (January), 8-17 . TOMASIC,A.AND GARCIA-MOLINA, H. 1993a. Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems (January), 8-17."},{"issue":"3","key":"e_1_2_1_31_1","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1007\/BF01228671","article-title":"Query processing and inverted indices in shared-nothing document information retrieval systems","volume":"2","author":"TOMASIC A.","year":"1993","unstructured":"TOMASIC , A. AND GARCIA-MOLINA , H. 1993 b. Query processing and inverted indices in shared-nothing document information retrieval systems . VLDB Journal 2 , 3 , 243 - 275 . TOMASIC,A.AND GARCIA-MOLINA, H. 1993b. Query processing and inverted indices in shared-nothing document information retrieval systems. VLDB Journal 2, 3, 243-275.","journal-title":"VLDB Journal"},{"key":"e_1_2_1_32_1","first-page":"289","volume-title":"Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (May), ACM Press","author":"TOMASIC A.","year":"1994","unstructured":"TOMASIC , A. , GARCIA-MOLINA , H. , AND SHOENS , K. 1994 . Incremental update of inverted list for text document retrieval . In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (May), ACM Press , New York, NY , 289 - 300 . 10.1145\/191839.191896 TOMASIC, A., GARCIA-MOLINA, H., AND SHOENS, K. 1994. Incremental update of inverted list for text document retrieval. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (May), ACM Press, New York, NY, 289-300. 10.1145\/191839.191896"},{"key":"e_1_2_1_33_1","first-page":"157","volume-title":"32nd Southeast Conference of the ACM, ACM Press","author":"VILES C. L.","year":"1994","unstructured":"VILES , C. L. 1994 . Maintaining state in a distributed information retrieval system . In 32nd Southeast Conference of the ACM, ACM Press , New York, NY , 157 - 161 . VILES, C. L. 1994. Maintaining state in a distributed information retrieval system. In 32nd Southeast Conference of the ACM, ACM Press, New York, NY, 157-161."},{"key":"e_1_2_1_34_1","first-page":"12","volume-title":"Proceedigns of the 18th International ACM Conference on Research and Development in Information Retrieval (July), ACM Press","author":"VILES C.L.","year":"1995","unstructured":"VILES , C.L. AND FRENCH , J. C. 1995 . Dissemination of collection wide information in a distributed information retrieval system . In Proceedigns of the 18th International ACM Conference on Research and Development in Information Retrieval (July), ACM Press , New York, NY , 12 - 20 . 10.1145\/215206.215327 VILES,C.L.AND FRENCH, J. C. 1995. Dissemination of collection wide information in a distributed information retrieval system. In Proceedigns of the 18th International ACM Conference on Research and Development in Information Retrieval (July), ACM Press, New York, NY, 12-20. 10.1145\/215206.215327"},{"key":"e_1_2_1_35_1","volume-title":"Managing Gigabytes: Compressing and Indexing Documents and Images","author":"WITTEN I. H.","year":"1999","unstructured":"WITTEN , I. H. , MOFFAT , A. , AND BELL , T. C. 1999 . Managing Gigabytes: Compressing and Indexing Documents and Images ( 2 nd ed.). Morgan Kauffman Publishing , San Francisco . WITTEN, I. H., MOFFAT, A., AND BELL, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images (2nd ed.). Morgan Kauffman Publishing, San Francisco.","edition":"2"},{"key":"e_1_2_1_36_1","first-page":"352","volume-title":"18th International Conference on Very Large Databases (August","author":"ZOBEL J.","year":"1992","unstructured":"ZOBEL , J. , MOFFAT , A. , AND SACKS-DAVIS , R. 1992 . An efficient indexing technique for full-text database systems . In 18th International Conference on Very Large Databases (August 1992), pp. 352 - 362 . ZOBEL, J., MOFFAT, A., AND SACKS-DAVIS, R. 1992. An efficient indexing technique for full-text database systems. In 18th International Conference on Very Large Databases (August 1992), pp. 352-362."}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/502115.502116","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/502115.502116","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:15:13Z","timestamp":1750281313000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/502115.502116"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2001,7]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2001,7]]}},"alternative-id":["10.1145\/502115.502116"],"URL":"https:\/\/doi.org\/10.1145\/502115.502116","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2001,7]]},"assertion":[{"value":"2001-07-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}