{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:23:26Z","timestamp":1777854206707,"version":"3.51.4"},"reference-count":15,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[1987,12,1]],"date-time":"1987-12-01T00:00:00Z","timestamp":565315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[1987,12]]},"abstract":"<jats:p>The use of automatic classification techniques has been suggested as a means of increasing the effectiveness of docu ment retrieval systems; however, the automatic generation of a classification requires a large amount of computation, and it is thus of importance to know whether this computation will result in material increases in retrieval performance. This paper describes three methods - the overlap test, the nearest neighbour test and the density test - which can be used to measure the degree of clustering tendency in a set of docu ments. It is shown that the three tests are not in complete agreement with each other in their evaluation of the degree of clustering tendency present in seven document test collections. A comparison of the predicted degree of clustering tendency with the relative effectiveness of cluster and non-cluster searches suggests that the density test gives the most useful results; it also has the advantage that it does not require query and relevance data and can thus be used in a predictive manner when a document collection is to be processed for the first time.<\/jats:p>","DOI":"10.1177\/016555158701300607","type":"journal-article","created":{"date-parts":[[2007,3,17]],"date-time":"2007-03-17T23:37:35Z","timestamp":1174174655000},"page":"361-365","source":"Crossref","is-referenced-by-count":29,"title":["Techniques for the measurement of clustering tendency in document retrieval systems"],"prefix":"10.1177","volume":"13","author":[{"given":"Abdelmoula","family":"El-Hamdouchi","sequence":"first","affiliation":[{"name":"Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Willett","sequence":"additional","affiliation":[{"name":"Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[1987,12,1]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630350503"},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4379(80)90010-1"},{"key":"atypb3","volume-title":"The use of inter-document relationships in information retrieval","author":"A. El-Hamdouchi","year":"1987"},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630370102"},{"key":"atypb5","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630310411"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0271(71)90051-9"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-7373(86)80063-3"},{"key":"atypb8","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/26.4.354"},{"key":"atypb9","volume-title":"Introduction to Modern Information Retrieval","author":"G. Salton","year":"1983"},{"key":"atypb10","volume-title":"Non-Parametric Statistics","author":"S. Siegal","year":"1956"},{"key":"atypb11","volume-title":"Information Retrieval","author":"C.J. van Rijsbergen","year":"1979"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1108\/eb026557"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(86)90006-3"},{"key":"atypb14","volume-title":"The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval","author":"E.M. Voorhees","year":"1985"},{"key":"atypb15","volume-title":"The cluster hypothesis revisited. Technical report TR 85-658","author":"E.M. Voorhees","year":"1985"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/016555158701300607","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/016555158701300607","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:03:43Z","timestamp":1777503823000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/016555158701300607"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1987,12]]},"references-count":15,"journal-issue":{"issue":"6","published-print":{"date-parts":[[1987,12]]}},"alternative-id":["10.1177\/016555158701300607"],"URL":"https:\/\/doi.org\/10.1177\/016555158701300607","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[1987,12]]}}}