{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:20:56Z","timestamp":1775283656359,"version":"3.50.1"},"reference-count":16,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2007,3,22]],"date-time":"2007-03-22T00:00:00Z","timestamp":1174521600000},"content-version":"vor","delay-in-days":11587,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Am. Soc. Inf. Sci."],"published-print":{"date-parts":[[1975,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The problem studied in this research is that of developing a set of formal statistical rules for the purpose of identifying the keywords of a document\u2010words likely to be useful as index terms for that document. The research was prompted by the observation, made by a number of writers, that non\u2010specialty words, words which possess little value for indexing purposes, tend to be distributed at random in a collection of documents. In contrast, specialty words are not so distributed.<\/jats:p><jats:p>In Part I of the study, a mixture of two Poisson distributions is examined in detail as a model of specialty word distribution, and formulas expressing the three parameters of the model in terms of empirical frequency statistics are derived. The fit of the model is tested on an experimental document collection and found to be acceptable for the purposes of the study. A measure intended to identify specialty words, consistent with the 2\u2010Poisson model, is proposed and evaluated.<\/jats:p>","DOI":"10.1002\/asi.4630260402","type":"journal-article","created":{"date-parts":[[2007,6,28]],"date-time":"2007-06-28T01:06:12Z","timestamp":1182992772000},"page":"197-206","source":"Crossref","is-referenced-by-count":120,"title":["A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature"],"prefix":"10.1002","volume":"26","author":[{"given":"Stephen P.","family":"Harter","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2007,3,22]]},"reference":[{"key":"e_1_2_1_1_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.5090190409"},{"key":"e_1_2_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/321075.321084"},{"key":"e_1_2_1_3_2","first-page":"152","article-title":"Distributional Constraints and the Automatic Selection of an Indexing Vocabulary","volume":"4","author":"Curtice R. M.","year":"1967","journal-title":"Proceedings of the American Documentation Institute Annual Meeting"},{"key":"e_1_2_1_4_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630250505"},{"key":"e_1_2_1_5_2","volume-title":"Information Storage and Retrieval (ISR\u2010XVIII)","author":"Bonwit K.","year":"1970"},{"key":"e_1_2_1_6_2","volume-title":"Statistic Association Methods for Mechanized Documentation","author":"Dennis S. F.","year":"1965"},{"key":"e_1_2_1_7_2","volume-title":"Abstracts of the Standard Edition of the Complete Psychological Works of Sigmund Freud","author":"Rothgeb C. L.","year":"1972"},{"key":"e_1_2_1_8_2","volume-title":"A General Selection from the Works of Sigmund Freud","author":"Rickman M. D.","year":"1957"},{"key":"e_1_2_1_9_2","volume-title":"Computer Compiled Cumulative Index of the Standard Edition of the Complete Psychological Works of Sigmund Freud","author":"Klumpner G. H. M. D.","year":"1970"},{"key":"e_1_2_1_10_2","volume-title":"Classical and Contagious Discrete Distributions, Proceedings of the International Symposium (McGill University), Montreal, Canada, August 15\u201320, 1963","author":"Blischke W. R.","year":"1965"},{"key":"e_1_2_1_11_2","first-page":"186","volume-title":"Introduction to the Theory of Statistics","author":"Mood A. M.","year":"1963"},{"key":"e_1_2_1_12_2","first-page":"225","article-title":"Estimating the Parameters of Mixed Poisson, Binomial, and Weibull Distributions by the Method of Moments","volume":"39","author":"Rider P. R.","year":"1961","journal-title":"Bulletin of the Institute for International Statistics"},{"key":"e_1_2_1_13_2","unstructured":"Harter S. A Probabilistic Approach to Automatic Keyword Indexing Ph.D. Dissertation University of Chicago (1974)."},{"key":"e_1_2_1_14_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.141.3577.245"},{"key":"e_1_2_1_15_2","doi-asserted-by":"publisher","DOI":"10.1108\/eb026442"},{"key":"e_1_2_1_16_2","volume-title":"Discriminant Analysis: The Study of Group Difference","author":"Tatsuoka M. M.","year":"1970"}],"container-title":["Journal of the American Society for Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fasi.4630260402","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/asi.4630260402","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T17:28:36Z","timestamp":1759944516000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/asi.4630260402"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1975,7]]},"references-count":16,"journal-issue":{"issue":"4","published-print":{"date-parts":[[1975,7]]}},"alternative-id":["10.1002\/asi.4630260402"],"URL":"https:\/\/doi.org\/10.1002\/asi.4630260402","archive":["Portico"],"relation":{},"ISSN":["0002-8231","1097-4571"],"issn-type":[{"value":"0002-8231","type":"print"},{"value":"1097-4571","type":"electronic"}],"subject":[],"published":{"date-parts":[[1975,7]]}}}