{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,19]],"date-time":"2025-08-19T10:42:53Z","timestamp":1755600173467,"version":"3.38.0"},"reference-count":27,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2007,7,12]],"date-time":"2007-07-12T00:00:00Z","timestamp":1184198400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2008,4]]},"abstract":"<jats:p> In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries. <\/jats:p>","DOI":"10.1177\/0165551507082592","type":"journal-article","created":{"date-parts":[[2007,12,4]],"date-time":"2007-12-04T01:26:12Z","timestamp":1196731572000},"page":"213-230","source":"Crossref","is-referenced-by-count":18,"title":["A comparative study of two automatic document classification methods in a library setting"],"prefix":"10.1177","volume":"34","author":[{"given":"Joanna Yi-Hang","family":"Pong","sequence":"first","affiliation":[{"name":"Run Run Shaw Library, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong"}]},{"given":"Ron Chi-Wai","family":"Kwok","sequence":"additional","affiliation":[{"name":"Department of Information Systems, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong,"}]},{"given":"Raymond Yiu-Keung","family":"Lau","sequence":"additional","affiliation":[{"name":"Department of Information Systems, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong"}]},{"given":"Jin-Xing","family":"Hao","sequence":"additional","affiliation":[{"name":"Department of Information Systems, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong"}]},{"given":"Percy Ching-Chi","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Information Systems, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong"}]}],"member":"179","published-online":{"date-parts":[[2007,7,12]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1016\/j.acalib.2007.02.003"},{"key":"atypb2","doi-asserted-by":"publisher","DOI":"10.1016\/S1464-9055(03)00069-1"},{"volume-title":"Proceedings of the 5th ACM\/IEEE-CS Joint Conference on Digital libraries","author":"D. Levy","key":"atypb3"},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1016\/S0099-1333(99)80058-5"},{"key":"atypb5","doi-asserted-by":"publisher","DOI":"10.1016\/S0957-4174(97)00001-8"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1177\/016555150302900204"},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"volume-title":"Proceedings of ECML 98, 10th European Conference on Machine Learning","author":"T. Joachims","key":"atypb8"},{"volume-title":"Proceedings of the 14th International Conference on Machine Learning ICML97","author":"D. Koller","key":"atypb9"},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(02)00022-5"},{"volume-title":"Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval","author":"Y. Yang","key":"atypb11"},{"volume-title":"Organizing digital libraries by automated text categorization","year":"2002","author":"H. Avancini","key":"atypb12"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1147\/rd.22.0159"},{"volume-title":"Introduction to Modern Information Retrieval","year":"1983","author":"G. Salton","key":"atypb14"},{"volume-title":"Proceedings of the 21st International Conference on Research and Development in Information Retrieval (SIGIR'98)","author":"L.D. Baker","key":"atypb15"},{"key":"atypb16","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009983522080"},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345593"},{"key":"atypb18","doi-asserted-by":"publisher","DOI":"10.1145\/511144.511148"},{"key":"atypb19","doi-asserted-by":"publisher","DOI":"10.1145\/792550.792563"},{"volume-title":"Exploiting LCSH, LCC and DDC to retrieve networked resources issues and challenges (2000)","year":"2007","author":"L.M. Chan","key":"atypb20"},{"volume-title":"Exploiting LCSH, LCC, and DDC to retrieve network resources (2001)","year":"2007","author":"D. Vizine-Goetz","key":"atypb21"},{"volume-title":"The Library of Congress Classification as a knowledge base for automatic subject categorization, IFLA Preconference \"Subject Retrieval in a Networked Environment\"","year":"2001","author":"J. Godby","key":"atypb22"},{"volume-title":"23rd International Online Information Meeting","author":"G. Moller","key":"atypb23"},{"volume-title":"Proceedings of ECML-98, 10th European Conference on Machine Learning","author":"D.D. Lewis","key":"atypb24"},{"key":"atypb25","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"volume-title":"Machine Learning","year":"1997","author":"T.M. Mitchell","key":"atypb26"},{"volume-title":"Reuters-21578 text categorization test collection","year":"1997","author":"D.D. Lewis","key":"atypb27"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551507082592","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551507082592","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T03:36:46Z","timestamp":1740973006000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551507082592"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,7,12]]},"references-count":27,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2008,4]]}},"alternative-id":["10.1177\/0165551507082592"],"URL":"https:\/\/doi.org\/10.1177\/0165551507082592","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"type":"print","value":"0165-5515"},{"type":"electronic","value":"1741-6485"}],"subject":[],"published":{"date-parts":[[2007,7,12]]}}}