{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:38:38Z","timestamp":1761007118945,"version":"build-2065373602"},"reference-count":24,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2013,1,24]],"date-time":"2013-01-24T00:00:00Z","timestamp":1358985600000},"content-version":"vor","delay-in-days":389,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc of Assoc for Info"],"published-print":{"date-parts":[[2012,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We propose the Least Information theory (LIT) to quantify meaning of information in probability distributions and derive a new document representation model for text classification. By extending Shannon entropy to accommodate a non\u2010linear relation between information and uncertainty, LIT offers an information\u2010centric approach to weight terms based on probability distributions in documents vs. in the collection. We develop two term weight quantities in the document classification context: 1) LI Binary (LIB) which quantifies (least) information due to the observation of a term's (binary) occurrence in a document; and 2) LI Frequency (LIF) which measures information for the observation of a randomly picked term from the document. Both quantities are computed given term distributions in the document collection as prior knowledge and can be used separately or combined to represent documents for text classification. We conduct classification experiments on three benchmark collections, in which the proposed methods show strong performances compared to classic TF*IDF. Particularly, the LIB*LIF weighting scheme, which combines LIB and LIF, outperforms TF*IDF in several experimental settings. Despite its similarity to TF*IDF, the formulation of LIB*LIF is very different and offers a new way of thinking for modeling information processes beyond classification.<\/jats:p>","DOI":"10.1002\/meet.14504901118","type":"journal-article","created":{"date-parts":[[2013,1,24]],"date-time":"2013-01-24T10:49:23Z","timestamp":1359024563000},"page":"1-10","source":"Crossref","is-referenced-by-count":2,"title":["Least information document representation for automated text classification"],"prefix":"10.1002","volume":"49","author":[{"given":"Weimao","family":"Ke","sequence":"first","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2013,1,24]]},"reference":[{"key":"e_1_2_6_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00153759"},{"key":"e_1_2_6_3_1","doi-asserted-by":"crossref","unstructured":"Aizawa A.(2000).The feature quantity: an information theoretic perspective of tfidf\u2010like measures. InProceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval SIGIR '00 pages 104\u2013111 New York NY USA. ACM.","DOI":"10.1145\/345508.345556"},{"volume-title":"Atoms and Information Theory: An Introduction to Statistical Mechanics","year":"1971","author":"Baierlein R.","key":"e_1_2_6_4_1"},{"key":"e_1_2_6_5_1","unstructured":"Craven M. DiPasquo D. Freitag D. McCallum A. Mitchell T. Nigam K. andSlattery S.(1998).Learning to extract symbolic knowledge from the world wide web. InProceedings of the fifteenth national\/tenth conference on Artificial intelligence\/Innovative applications of artificial intelligence AAAI '98\/IAAI '98 pages 509\u2013516 Menlo Park CA USA. American Association for Artificial Intelligence."},{"key":"e_1_2_6_6_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.3057290"},{"key":"e_1_2_6_7_1","doi-asserted-by":"crossref","unstructured":"Ke W. Mostafa J. andFu Y.(2007).Collaborative classifier agents: studying the impact of learning in distributed document classification. InProceedings of the 7th ACM\/IEEE\u2010CS joint conference on Digital libraries JCDL '07 pages 428\u2013437 New York NY USA. ACM.","DOI":"10.1145\/1255175.1255263"},{"key":"e_1_2_6_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/319382.319394"},{"key":"e_1_2_6_9_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729694"},{"key":"e_1_2_6_10_1","doi-asserted-by":"crossref","unstructured":"Lang K.(1995).Newsweeder: Learning to filter netnews. InProceedings of the Twelfth International Conference on Machine Learning pages 331\u2013339.","DOI":"10.1016\/B978-1-55860-377-6.50048-7"},{"key":"e_1_2_6_11_1","first-page":"361","article-title":"Rcv1: A new benchmark collection for text categorization research","volume":"5","author":"Lewis D. D.","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_6_12_1","unstructured":"Liu T. Liu S. Cheng Z. andMa W.\u2010Y.(2003).An evaluation on feature selection for text clustering. InProceedings of the Twentieth International Conference on Machine Learning (ICML\u20102003) Washington DC."},{"key":"e_1_2_6_13_1","first-page":"22","article-title":"Development of a stemming algorithm","volume":"11","author":"Lovins J. B.","year":"1968","journal-title":"Mechanical Translation and Computational Linguistics"},{"key":"e_1_2_6_14_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"e_1_2_6_15_1","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560582"},{"key":"e_1_2_6_16_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_2_6_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"key":"e_1_2_6_18_1","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"e_1_2_6_19_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630340110"},{"key":"e_1_2_6_20_1","doi-asserted-by":"crossref","unstructured":"Siegler M.andWitbrock M.(1999).Improving the suitability of imperfect transcriptions for information retrieval from spoken documents. InProceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) pages505\u2013508. IEEE Press.","DOI":"10.1109\/ICASSP.1999.758173"},{"key":"e_1_2_6_21_1","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560573"},{"key":"e_1_2_6_22_1","doi-asserted-by":"crossref","unstructured":"Taulbee O. E.(1965).Invited papers: classification in information storage and retrieval. InProceedings of the 1965 20th national conference ACM '65 pages 119\u2013137 New York NY USA. ACM. Chairman\u2010House R. W.","DOI":"10.1145\/800197.806038"},{"volume-title":"Data Mining: Practical machine learning tools and techniques","year":"2005","author":"Witten I. H.","key":"e_1_2_6_23_1"},{"key":"e_1_2_6_24_1","unstructured":"Yang Y.andPedersen J. O.(1997).A comparative study on feature selection in text categorization. InProceedings of the Fourteenth International Conference on Machine Learning ICML '97 pages 412\u2013420 San Francisco CA USA. Morgan Kaufmann Publishers Inc."},{"key":"e_1_2_6_25_1","doi-asserted-by":"crossref","unstructured":"Zhang D. Wang J. andSi L.(2011).Document clustering with universum. InProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval SIGIR '11 pages 873\u2013882 New York NY USA. ACM.","DOI":"10.1145\/2009916.2010033"}],"container-title":["Proceedings of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.14504901118","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.14504901118","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/meet.14504901118","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T11:34:32Z","timestamp":1760960072000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/meet.14504901118"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,1]]}},"alternative-id":["10.1002\/meet.14504901118"],"URL":"https:\/\/doi.org\/10.1002\/meet.14504901118","archive":["Portico"],"relation":{},"ISSN":["0044-7870","1550-8390"],"issn-type":[{"type":"print","value":"0044-7870"},{"type":"electronic","value":"1550-8390"}],"subject":[],"published":{"date-parts":[[2012,1]]}}}