{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:38:54Z","timestamp":1750307934100,"version":"3.41.0"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2007,8,1]],"date-time":"2007-08-01T00:00:00Z","timestamp":1185926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2007,8]]},"abstract":"<jats:p>Large amounts of (often valuable) information are stored in web-accessible text databases. \u201cMetasearchers\u201d provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not evolve over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this article, we first report the results of a study showing how the content summaries of 152 real web databases evolved over a period of 52 weeks. Then, we show how to use \u201csurvival analysis\u201d techniques in general, and Cox's proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the quality of our schedules experimentally over real web databases.<\/jats:p>","DOI":"10.1145\/1272743.1272744","type":"journal-article","created":{"date-parts":[[2007,9,14]],"date-time":"2007-09-14T13:44:55Z","timestamp":1189777495000},"page":"14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Modeling and managing changes in text databases"],"prefix":"10.1145","volume":"32","author":[{"given":"Panagiotis G.","family":"Ipeirotis","sequence":"first","affiliation":[{"name":"New York University, New York, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexandros","family":"Ntoulas","sequence":"additional","affiliation":[{"name":"Microsoft Search Labs, Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junghoo","family":"Cho","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luis","family":"Gravano","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2007,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.3998\/3336451.0007.104"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Brewington B. E. and Cybenko G. 2000a. How dynamic is the web&quest; In Proceedings of the 9th International World Wide Web Conference (WWW9). 257--276.   Brewington B. E. and Cybenko G. 2000a. How dynamic is the web&quest; In Proceedings of the 9th International World Wide Web Conference (WWW9). 257--276.","DOI":"10.1016\/S1389-1286(00)00045-1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.841784"},{"volume-title":"Adv. Inf. Retriev","author":"Callan J. P.","key":"e_1_2_1_4_1","unstructured":"Callan , J. P. 2000. Distributed information retrieval . In Adv. Inf. Retriev . Kluwer Academic Publishers , 127--150. Callan, J. P. 2000. Distributed information retrieval. In Adv. Inf. Retriev. Kluwer Academic Publishers, 127--150."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/382979.383040"},{"volume-title":"Mining the web. Morgan-Kaufmann","author":"Chakrabarti S.","key":"e_1_2_1_6_1","unstructured":"Chakrabarti , S. 2002. Mining the web. Morgan-Kaufmann , San Francisco, CA . Chakrabarti, S. 2002. Mining the web. Morgan-Kaufmann, San Francisco, CA."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 26th International Conference on Very Large Databases (VLDB","author":"Cho J.","year":"2000","unstructured":"Cho , J. and Garc\u00eda-Molina , H . 2000. The evolution of the web and implications for an incremental crawler . In Proceedings of the 26th International Conference on Very Large Databases (VLDB 2000 ). 200--209. Cho, J. and Garc\u00eda-Molina, H. 2000. The evolution of the web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Databases (VLDB 2000). 200--209."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/857166.857170"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335391"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1287369.1287414"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1099-1425(199806)1:1<15::AID-JOS3>3.0.CO;2-K"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1972.tb00899.x"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 1st USENIX Symposium on Internet Technologies and Systems (USITS","author":"Douglis F.","year":"1997","unstructured":"Douglis , F. , Feldmann , A. , Krishnamurthy , B. , and Mogul , J. C . 1997. Rate of change and other metrics: A live study of the world wide web . In Proceedings of the 1st USENIX Symposium on Internet Technologies and Systems (USITS 1997 ). 16--31. Douglis, F., Feldmann, A., Krishnamurthy, B., and Mogul, J. C. 1997. Rate of change and other metrics: A live study of the world wide web. In Proceedings of the 1st USENIX Symposium on Internet Technologies and Systems (USITS 1997). 16--31."},{"key":"e_1_2_1_14_1","unstructured":"Duda R. O. Hart P. E. and Stork D. G. 2000. Pattern Classification 2nd ed. Wiley New York.   Duda R. O. Hart P. E. and Stork D. G. 2000. Pattern Classification 2nd ed. Wiley New York."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.371960"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/775152.775246"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/253260.253299"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/320248.320252"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/635484.635485"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Hastie T. Tibshirani R. and Friedman J. H. 2001. The Elements of Statistical Learning. Springer-Verlag New York.  Hastie T. Tibshirani R. and Friedman J. H. 2001. The Elements of Statistical Learning. Springer-Verlag New York.","DOI":"10.1007\/978-0-387-21606-5"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 28th International Conference on Very Large Databases (VLDB","author":"Ipeirotis P. G.","year":"2002","unstructured":"Ipeirotis , P. G. and Gravano , L . 2002. Distributed search over the hidden web: Hierarchical database sampling and selection . In Proceedings of the 28th International Conference on Very Large Databases (VLDB 2002 ). 394--405. Ipeirotis, P. G. and Gravano, L. 2002. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB 2002). 394--405."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.91"},{"volume-title":"Statistical Methods for Speech Recognition","author":"Jelinek F.","key":"e_1_2_1_23_1","unstructured":"Jelinek , F. 1999. Statistical Methods for Speech Recognition . The MIT Press . Jelinek, F. 1999. Statistical Methods for Speech Recognition. The MIT Press."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 2nd International Conference on Web-Age Information Management (WAIM","author":"Lim L.","year":"2001","unstructured":"Lim , L. , Wang , M. , Padmanabhan , S. , Vitter , J. S. , and Agarwal , R. C . 2001. Characterizing web document change . In Proceedings of the 2nd International Conference on Web-Age Information Management (WAIM 2001 ). 133--144. Lim, L., Wang, M., Padmanabhan, S., Vitter, J. S., and Agarwal, R. C. 2001. Characterizing web document change. In Proceedings of the 2nd International Conference on Web-Age Information Management (WAIM 2001). 133--144."},{"volume-title":"Applied Statistics","author":"Marques De S\u00e1 J. P.","key":"e_1_2_1_25_1","unstructured":"Marques De S\u00e1 , J. P. 2003. Applied Statistics . Springer-Verlag , New York . Marques De S\u00e1, J. P. 2003. Applied Statistics. Springer-Verlag, New York."},{"key":"e_1_2_1_26_1","series-title":"Lecture Notes in Mathematics","volume-title":"Numerical Analysis","author":"Mor\u00e9 J. J.","unstructured":"Mor\u00e9 , J. J. 1977. The Levenberg-Marquardt algorithm: Implementation and theory . In Numerical Analysis , Lecture Notes in Mathematics vol. 630 , Springer-Verlag , New York . 105--116. Mor\u00e9, J. J. 1977. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis, Lecture Notes in Mathematics vol. 630, Springer-Verlag, New York. 105--116."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988674"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564701"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060805"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860490"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/0197-2456(81)90005-2"},{"volume-title":"Proceedings of the 8th International World Wide Web Conference (WWW8). 1231--1243","author":"Wills C. E.","key":"e_1_2_1_32_1","unstructured":"Wills , C. E. and Mikhailov , M . 1999. Towards a better understanding of web resources and server responses for improved caching . In Proceedings of the 8th International World Wide Web Conference (WWW8). 1231--1243 . Wills, C. E. and Mikhailov, M. 1999. Towards a better understanding of web resources and server responses for improved caching. In Proceedings of the 8th International World Wide Web Conference (WWW8). 1231--1243."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/511446.511465"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1272743.1272744","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1272743.1272744","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:52:03Z","timestamp":1750258323000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1272743.1272744"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,8]]},"references-count":33,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,8]]}},"alternative-id":["10.1145\/1272743.1272744"],"URL":"https:\/\/doi.org\/10.1145\/1272743.1272744","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"type":"print","value":"0362-5915"},{"type":"electronic","value":"1557-4644"}],"subject":[],"published":{"date-parts":[[2007,8]]},"assertion":[{"value":"2007-08-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}