{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:28:06Z","timestamp":1761006486699,"version":"build-2065373602"},"reference-count":38,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2014,5,8]],"date-time":"2014-05-08T00:00:00Z","timestamp":1399507200000},"content-version":"vor","delay-in-days":492,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc of Assoc for Info"],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Searching large collections of digitized books is a relatively new area in information\u2010seeking and retrieval research, made possible by initiatives such as Google Books and the HathiTrust Digital Library. The availability of large full\u2010text book collections is transforming how users search and interact with information in books, but the characteristics of these changes are unknown. This paper aims to provide insight into the characteristics of full\u2010text searches in a large collection of digitized books and is the first step in a broader research agenda intended to improve book retrieval. To better understand the types of queries that users are issuing to full\u2010text\u2010book collections, we analyzed a full year of anonymized query logs from the HathiTrust Digital Library full\u2010text search engine. We also manually classified a random sample of 600 queries to develop a taxonomy of book search query types. We found that users are beginning to search <jats:italic>for information in books<\/jats:italic> instead of searching for <jats:italic>books<\/jats:italic>. Searches still largely follow bibliographic models, but, as expected, new types of searches are beginning to take advantage of full\u2010text capabilities. Additionally, comparing the results of our query log analysis to searches in other domains, we found similar search patterns including short queries, sessions with only a few queries, and users viewing only a few pages of results per query. We discuss how these findings can be used to characterize users of large full\u2010text book collections.<\/jats:p>","DOI":"10.1002\/meet.14505001085","type":"journal-article","created":{"date-parts":[[2014,5,8]],"date-time":"2014-05-08T16:35:28Z","timestamp":1399566928000},"page":"1-10","source":"Crossref","is-referenced-by-count":3,"title":["Finding information in books: Characteristics of full\u2010text searches in a collection of 10 million books"],"prefix":"10.1002","volume":"50","author":[{"given":"Craig","family":"Willis","sequence":"first","affiliation":[]},{"given":"Miles","family":"Efron","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2014,5,8]]},"reference":[{"key":"e_1_2_9_2_1","doi-asserted-by":"crossref","unstructured":"Arguello J. Diaz F. Callan J. &Crespo J.(2009).Sources of evidence for vertical selection. InProceedings of the 32nd international ACM conference on research and development in information retrieval (SIGIR '09)(pp.315\u2013322).","DOI":"10.1145\/1571941.1571997"},{"key":"e_1_2_9_3_1","doi-asserted-by":"crossref","unstructured":"Beitzel S. M. Jensen E. C. Chowdhury A. &Frieder O.(2007).Varying approaches to topical web query classification. InProceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR '07)(pp.783\u2013784).","DOI":"10.1145\/1277741.1277907"},{"issue":"2","key":"e_1_2_9_4_1","first-page":"3","article-title":"The concept of a work in WorldCat: an application of FRBR","volume":"36","author":"Bennett R.","year":"2003","journal-title":"ACM SIGIR Forum"},{"key":"e_1_2_9_5_1","doi-asserted-by":"crossref","unstructured":"Broder A. Fontoura M. Gabrilovich E. Joshi A. Josifovski V. &Zhang T.(2007).Robust classification of rare queries using web knowledge. InProceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR '07)(pp.231\u2013238).","DOI":"10.1145\/1277741.1277783"},{"key":"e_1_2_9_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630300305"},{"key":"e_1_2_9_7_1","unstructured":"Burton\u2010West T.(2012).Practical Relevance Ranking for 10 Million Books. InINEX 2012 workshop pre\u2010proceedings."},{"key":"e_1_2_9_8_1","doi-asserted-by":"crossref","unstructured":"Carden M.(2008).E\u2010books are not books. InProceedings of the 2008 acm workshop on research advances in large digital book repositories.","DOI":"10.1145\/1458412.1458416"},{"key":"e_1_2_9_9_1","doi-asserted-by":"crossref","unstructured":"Efron M.(2013).Query representation for cross\u2010temporal information retrieval. InProceedings of the 36th international ACM SIGIR conference on research and development in information (SIGIR'13).","DOI":"10.1145\/2484028.2484054"},{"key":"e_1_2_9_10_1","doi-asserted-by":"crossref","unstructured":"Hearst M. A. Hurst M. &Dumais S. T.(2008).What should blog search look like?InProceedings of the 2008 ACM workshop on search in social media (SSM'08(pp.95\u201398).","DOI":"10.1145\/1458583.1458599"},{"key":"e_1_2_9_11_1","doi-asserted-by":"publisher","DOI":"10.1045\/september2002-hickey"},{"key":"e_1_2_9_12_1","unstructured":"Howard J.(2012 March).Google Begins to Scale Back Its Scanning of Books From University Libraries.Chronicle of Higher Education."},{"key":"e_1_2_9_13_1","unstructured":"Internet Archive. (2013).Scanning Services. Retrieved 5\/24\/2013 fromhttp:\/\/archive.org\/scanning"},{"key":"e_1_2_9_14_1","doi-asserted-by":"crossref","unstructured":"Jansen B. J. Zhang M. Sobel K. &Chowdury A.(2009).Micro\u2010blogging as online word of mouth branding. InProc. of the 27th international conference extended abstracts on human factors in computing systems (CHI EA'09)(pp.3859\u20133864).","DOI":"10.1145\/1520340.1520584"},{"key":"e_1_2_9_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007999900022"},{"key":"e_1_2_9_16_1","doi-asserted-by":"crossref","unstructured":"Kang I.\u2010H. &Kim G.(2003).Query type classification for web document retrieval. InProceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (SIGIR'03)(pp.64\u201371).","DOI":"10.1145\/860435.860449"},{"key":"e_1_2_9_17_1","doi-asserted-by":"crossref","unstructured":"Kazai G. &Landoni M.(2012).BooksOnline'12: 5th Workshop on Online Books Complementary Social Media and their Impact. InProceedings of the 20th annual conference on information an knowledge management (CIKM'12)(pp.2764\u20132765).","DOI":"10.1145\/2396761.2398757"},{"key":"e_1_2_9_18_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(1999)50:3<265::AID-ASI9>3.0.CO;2-R"},{"key":"e_1_2_9_19_1","doi-asserted-by":"crossref","unstructured":"Kim J. Feild H. &Cartright M.\u2010A.(2012).Understanding Book Search Behavior on the Web. InProceedings of the 21st acm international conference on information and knowledge management (CIKM'12)(pp.744\u2013753).","DOI":"10.1145\/2396761.2396856"},{"key":"e_1_2_9_20_1","doi-asserted-by":"crossref","unstructured":"Koolen M. Kazai G. Kamps J. Preminger M. Doucet A. &Landoni M.(2012).Overview of the INEX 2012 social book search track. InCLEF online working notes.","DOI":"10.1007\/978-3-642-35734-3_1"},{"key":"e_1_2_9_21_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199104)42:3<197::AID-ASI6>3.0.CO;2-T"},{"key":"e_1_2_9_22_1","doi-asserted-by":"crossref","unstructured":"Lee J. H. Renear A. &Smith L. C.(2007).Known\u2010Item Search: Variations on a Concept. InProceedings of the american society for information science and technology(pp.1\u201317).","DOI":"10.1002\/meet.14504301126"},{"key":"e_1_2_9_23_1","doi-asserted-by":"crossref","unstructured":"Li X. Wang Y.\u2010Y. &Acero A.(2008).Learning query intent from regularized click graphs. InProceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR '08)(pp.339\u2013346).","DOI":"10.1145\/1390334.1390393"},{"volume-title":"Library Research Models","year":"1993","author":"Mann T.","key":"e_1_2_9_24_1"},{"key":"e_1_2_9_25_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20462"},{"key":"e_1_2_9_26_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1199644"},{"key":"e_1_2_9_27_1","doi-asserted-by":"crossref","unstructured":"Morris M. R. Teevan J. &Panovich K.(2008).What Do People Ask Their Social Networks and Why? A Survey Study of Status Message Q & A Behavior. InProceedings of the SIGCHI conference on human factors in computing systems (CHI'08)(pp.1739\u20131748).","DOI":"10.1145\/1753326.1753587"},{"key":"e_1_2_9_28_1","doi-asserted-by":"crossref","unstructured":"Ponte J. M. &Croft W. B.(1998).A language modeling approach to information retrieval.Research and Development in Information Retrieval 275\u2013281.","DOI":"10.1145\/290941.291008"},{"key":"e_1_2_9_29_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026647"},{"key":"e_1_2_9_30_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(2000)51:8<757::AID-ASI80>3.0.CO;2-T"},{"key":"e_1_2_9_31_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630310505"},{"key":"e_1_2_9_32_1","doi-asserted-by":"publisher","DOI":"10.1002\/1097-4571(2000)9999:9999<::AID-ASI1591>3.0.CO;2-R"},{"volume-title":"Functional Requirements for Bibliographic Records","year":"1998","author":"Suar K.","key":"e_1_2_9_33_1"},{"key":"e_1_2_9_34_1","doi-asserted-by":"crossref","unstructured":"Teevan J. Ramage D. &Morris M. R.(2011).# TwitterSearch : A Comparison of Microblog Search and Web Search. InProceedings of the fourth ACM international conference on web search and data mining (WSDM'11)(pp.35\u201344).","DOI":"10.1145\/1935826.1935842"},{"issue":"2","key":"e_1_2_9_35_1","first-page":"150","article-title":"A Taxonomy of Bibliographic Relationships","volume":"35","author":"Tillett B. B.","year":"1991","journal-title":"Library Resources & Technical Services"},{"issue":"2","key":"e_1_2_9_36_1","first-page":"162","article-title":"Bibliographic Relationships: An Empirical Study of the LC Machine\u2010Readable Records","volume":"36","author":"Tillett B. B.","year":"1992","journal-title":"Library Resources & Technical Services"},{"key":"e_1_2_9_37_1","doi-asserted-by":"crossref","unstructured":"White R. W. &Morris D.(2007).Investigating the querying and browsing behavior of advanced search engine users. InProceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR '07)(pp.255\u2013262).","DOI":"10.1145\/1277741.1277787"},{"issue":"3","key":"e_1_2_9_38_1","article-title":"HathiTrust: The Elephant in the Library","volume":"32","author":"York J.","year":"2012","journal-title":"Library Issues"},{"issue":"2","key":"e_1_2_9_39_1","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1145\/984321.984322","article-title":"A study of smoothing methods for language models applied to information retrieval","volume":"2","author":"Zhai C.","year":"2004","journal-title":"ACM Transactions on Information Systems"}],"container-title":["Proceedings of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.14505001085","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.14505001085","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/meet.14505001085","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T09:14:44Z","timestamp":1760951684000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/meet.14505001085"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,1]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["10.1002\/meet.14505001085"],"URL":"https:\/\/doi.org\/10.1002\/meet.14505001085","archive":["Portico"],"relation":{},"ISSN":["0044-7870","1550-8390"],"issn-type":[{"type":"print","value":"0044-7870"},{"type":"electronic","value":"1550-8390"}],"subject":[],"published":{"date-parts":[[2013,1]]}}}