{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:44:55Z","timestamp":1761007495529,"version":"build-2065373602"},"reference-count":13,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2012,1,11]],"date-time":"2012-01-11T00:00:00Z","timestamp":1326240000000},"content-version":"vor","delay-in-days":375,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc of Assoc for Info"],"published-print":{"date-parts":[[2011,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Building topic models in federated digital collections presents numerous challenges due to metadata inconsistencies. The quality of topical metadata is difficult to ascertain and is interspersed with often irrelevant administrative metadata. In this study, we propose a way to improve topic modeling in large collections by identifying documents that convey only weak topical information. These documents are ignored when training topic models. Their topical associations are instead inferred model training. A method is outlined for identifying weakly topical documents by defining <jats:italic>runs<\/jats:italic> of similar documents in a collection. In preliminary evaluation using a corpus from the Institute of Museum and Library Services Digital Collections and Content aggregation, results show an increase in coherence among words in topics. In showing this, we demonstrate that it may be beneficial to induce topic models using less, higher\u2010quality data.<\/jats:p>","DOI":"10.1002\/meet.2011.14504801048","type":"journal-article","created":{"date-parts":[[2012,1,11]],"date-time":"2012-01-11T12:23:03Z","timestamp":1326284583000},"page":"1-10","source":"Crossref","is-referenced-by-count":8,"title":["Building topic models in a federated digital library through selective document exclusion"],"prefix":"10.1002","volume":"48","author":[{"given":"Miles","family":"Efron","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Organisciak","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Katrina","family":"Fenlon","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2012,1,11]]},"reference":[{"key":"e_1_2_8_2_1","doi-asserted-by":"crossref","unstructured":"Baillie M. Carman M. &Crestani F.(2011 Forthcoming).A multi\u2010collection latent topic model for federated search.Information Retrieval.","DOI":"10.1007\/s10791-010-9147-3"},{"key":"e_1_2_8_3_1","doi-asserted-by":"crossref","unstructured":"Blei D. M. &Jordan M. I.(2003).Modeling Annotated Data.Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York NY USA: ACM.","DOI":"10.1145\/860458.860460"},{"key":"e_1_2_8_4_1","unstructured":"Blei D. M. Ng A. Y. &Jordan M. I.(2003).Latent Dirichlet allocation.Journal of Machine Learning Research(3) 993\u20131022."},{"volume-title":"Reading tea leaves: How humans interpret topic models","author":"Chang J.","key":"e_1_2_8_5_1"},{"volume-title":"Statistical Language Models for Information Retrieval","year":"2008","author":"Zhai C. X.","key":"e_1_2_8_6_1"},{"key":"e_1_2_8_7_1","doi-asserted-by":"crossref","unstructured":"Minmo D. &McCallum A.(2007).Organizing the OCA: learning faceted subjects from a library of digital books.Proceedings of the 7th ACM\/IEEE\u2010CS joint conference on Digital libraries. New York: ACM.","DOI":"10.1145\/1255175.1255249"},{"key":"e_1_2_8_8_1","unstructured":"Newman D. Lau J. H. Grieser K. &Baldwin T.(2010).Automatic evaluation of topic coherence.HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics(pp.100\u2013108). Stroudsburg PA USA: Association for Computational Linguistics."},{"key":"e_1_2_8_9_1","unstructured":"Palmer C.L.&Knutson E.M.(2004).Metadata practices and implications for federated collections.Proceedings of the 67th Annual Meeting of the American Society for Information Science and Technology. (Providence RI Nov. 12\u201317)."},{"key":"e_1_2_8_10_1","doi-asserted-by":"crossref","unstructured":"Palmer C. L. Zavalina O. Fenlon K.(2010).Beyond size and search: Building contextual mass in digital aggregations for scholarly use. InProceedings of the ASIS&T Annual Meeting. (Pittsburgh PA Oct. 22\u201327).","DOI":"10.1002\/meet.14504701213"},{"key":"e_1_2_8_11_1","unstructured":"Shreeves S.L. Knutson E.M. Stvilia B. Palmer C.L. Twidale M.B. &Cole T.W.Is \u2018quality\u2019 metadata \u2018shareable\u2019 metadata? The implications of local metadata practice on federated collections.Proceedings of the Twelfth National Conference of the Association of College and Research Libraries (April 7\u201310 2005 Minneapolis MN April 7\u201310)."},{"key":"e_1_2_8_12_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214506000000302"},{"key":"e_1_2_8_13_1","doi-asserted-by":"crossref","unstructured":"Wickett K. Renear A. Urban R.(2010).Rule categories for collection\/item metadata relationships.Proceedings of the ASIS&T Annual Meeting. (Pittsburgh PA Oct. 22\u201327).","DOI":"10.1002\/meet.14504701218"},{"key":"e_1_2_8_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89533-6_10"}],"container-title":["Proceedings of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.2011.14504801048","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/meet.2011.14504801048","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T13:49:05Z","timestamp":1760968145000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/meet.2011.14504801048"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,1]]},"references-count":13,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,1]]}},"alternative-id":["10.1002\/meet.2011.14504801048"],"URL":"https:\/\/doi.org\/10.1002\/meet.2011.14504801048","archive":["Portico"],"relation":{},"ISSN":["0044-7870","1550-8390"],"issn-type":[{"type":"print","value":"0044-7870"},{"type":"electronic","value":"1550-8390"}],"subject":[],"published":{"date-parts":[[2011,1]]}}}