{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T05:43:08Z","timestamp":1770961388902,"version":"3.50.1"},"reference-count":51,"publisher":"Emerald","issue":"1","license":[{"start":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T00:00:00Z","timestamp":1651017600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JD"],"published-print":{"date-parts":[[2023,1,10]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>With an explosion of datasets available on the Web, dataset search has gained attention as an emerging research domain. Understanding users' dataset behaviour is imperative for providing effective data discovery services. In this paper, the authors present a study on users' dataset search behaviour through the analysis of search logs from a research data discovery portal.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>Using query and session based features, the authors apply cluster analysis to discover distinct user profiles with different search behaviours. One particular behavioural construct of our interest is users' expertise that the authors generate via computing semantic similarity between users' search queries and the title of metadata records in the displayed search results.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>The findings revealed that there are six distinct classes of user behaviours for dataset search, namely; Expert Research, Expert Search, Expert Explore, Novice Research, Novice Search and Novice Explore.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Research limitations\/implications<\/jats:title><jats:p>The user profiles are derived based on analysis of the search log of the research data catalogue in this study. Further research is needed to generalise the user profiles to other dataset search settings. Future research can take on a confirmatory approach to verify these user groups and establish a deeper understanding of their information needs.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Practical implications<\/jats:title><jats:p>The findings in this paper have implications for designing search systems that tailor search results matching the diverse information needs of different user groups.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>We propose for the first time a taxonomy of users for dataset search based on their domain expertise and search behaviour.<\/jats:p><\/jats:sec>","DOI":"10.1108\/jd-12-2021-0245","type":"journal-article","created":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T05:59:50Z","timestamp":1650952790000},"page":"66-85","source":"Crossref","is-referenced-by-count":10,"title":["Large-scale analysis of query logs to profile users for dataset search"],"prefix":"10.1108","volume":"79","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0957-5380","authenticated-orcid":false,"given":"Romina","family":"Sharifpour","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1206-3431","authenticated-orcid":false,"given":"Mingfang","family":"Wu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5558-3790","authenticated-orcid":false,"given":"Xiuzhen","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"140","published-online":{"date-parts":[[2022,4,27]]},"reference":[{"issue":"2","key":"key2023012512322329100_ref001","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1086\/602333","article-title":"Topic knowledge and online catalog search formulation","volume":"61","year":"1991","journal-title":"The Library Quarterly"},{"key":"key2023012512322329100_ref002","first-page":"103","article-title":"Impact of response latency on user behavior in web search","year":"2014"},{"issue":"11","key":"key2023012512322329100_ref003","doi-asserted-by":"crossref","first-page":"2635","DOI":"10.1002\/asi.23617","article-title":"Is exploratory search different? A comparison of information search behavior for exploratory and lookup tasks","volume":"67","year":"2016","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"key2023012512322329100_ref004","article-title":"Important cognitive components of domain-specific search knowledge","volume-title":"TREC","year":"2001"},{"key":"key2023012512322329100_ref005","first-page":"610","article-title":"Domain-specific search strategies for the effective retrieval of healthcare and shopping information","year":"2002"},{"issue":"1","key":"key2023012512322329100_ref006","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1002\/asi.20238","article-title":"Strategy hubs: domain portals to help find comprehensive information","volume":"57","year":"2006","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"key2023012512322329100_ref007","first-page":"1365","article-title":"Google dataset search: building a search engine for datasets in an open web ecosystem","year":"2019"},{"issue":"2","key":"key2023012512322329100_ref008","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/792550.792552","article-title":"A taxonomy of web search","volume":"36","year":"2002","journal-title":"ACM Sigir Forum"},{"key":"key2023012512322329100_ref009","article-title":"Using centroids of word embeddings and word mover's distance for biomedical document retrieval in question answering","year":"2016"},{"key":"key2023012512322329100_ref010","article-title":"Characteristics of dataset retrieval sessions: experiences from a real-life digital library","year":"2020"},{"issue":"1","key":"key2023012512322329100_ref011","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/s00778-019-00564-x","article-title":"Dataset search: a survey","volume":"29","year":"2020","journal-title":"The VLDB Journal"},{"key":"key2023012512322329100_ref012","first-page":"2445","article-title":"Towards more useable dataset search: from query characterization to snippet generation","year":"2019"},{"key":"key2023012512322329100_ref013","first-page":"221","article-title":"Actively predicting diverse search intent from user browsing behaviors","year":"2010"},{"key":"key2023012512322329100_ref014","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","year":"2018"},{"issue":"5","key":"key2023012512322329100_ref015","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1111\/j.1365-2729.2004.00093.x","article-title":"Searching for information in an online public access catalogue (opac): the impacts of information search expertise on the use of boolean operators","volume":"20","year":"2004","journal-title":"Journal of Computer Assisted Learning"},{"issue":"2","key":"key2023012512322329100_ref016","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1108\/eb024480","article-title":"Cognitive styles and searching","volume":"18","year":"1994","journal-title":"Online and CD-Rom Review"},{"key":"key2023012512322329100_ref017","volume-title":"Data Clustering: Theory, Algorithms, and Applications","year":"2007"},{"key":"key2023012512322329100_ref018","article-title":"Lost or found? Discovering data needed for research","year":"2019"},{"issue":"3","key":"key2023012512322329100_ref052","doi-asserted-by":"crossref","first-page":"212","DOI":"10.5860\/crl.66.3.212","article-title":"What have we got to lose? The effect of controlled vocabulary on keyword searching results","volume":"66","year":"2005","journal-title":"College and Research Libraries"},{"issue":"8","key":"key2023012512322329100_ref020","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1002\/asi.20180","article-title":"The effects of expertise and feedback on search term selection and subsequent learning","volume":"56","year":"2005","journal-title":"Journal of the American Society for Information Science and Technology"},{"issue":"1-6","key":"key2023012512322329100_ref021","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/S1389-1286(00)00031-1","article-title":"Web search behavior of internet experts and newbies","volume":"33","year":"2000","journal-title":"Computer Networks"},{"issue":"3","key":"key2023012512322329100_ref022","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1002\/(SICI)1097-4571(199304)44:3<161::AID-ASI5>3.0.CO;2-8","article-title":"Effects of search experience and subject knowledge on the search tactics of novice and experienced searchers","volume":"44","year":"1993","journal-title":"Journal of the American Society for Information Science"},{"issue":"1","key":"key2023012512322329100_ref023","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1016\/j.ipm.2004.10.007","article-title":"How are we searching the world wide web? A comparison of nine search engine transaction logs","volume":"42","year":"2006","journal-title":"Information Processing and Management"},{"issue":"6","key":"key2023012512322329100_ref024","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1016\/j.ipm.2009.05.004","article-title":"Using the taxonomy of cognitive learning to model online searching","volume":"45","year":"2009","journal-title":"Information Processing and Management"},{"key":"key2023012512322329100_ref025","first-page":"1485","article-title":"Characterising dataset search queries","year":"2018"},{"key":"key2023012512322329100_ref026","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.websem.2018.11.003","article-title":"Characterising dataset search\u2014an analysis of search logs and data requests","volume":"55","year":"2019","journal-title":"Journal of Web Semantics"},{"issue":"5","key":"key2023012512322329100_ref027","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1108\/10662241011084112","article-title":"Classifying the user intent of web queries using k-means clustering","volume":"20","year":"2010","journal-title":"Internet Research"},{"key":"key2023012512322329100_ref028","first-page":"197","article-title":"Are there any differences in data set retrieval compared to well-known literature retrieval?","year":"2015"},{"key":"key2023012512322329100_ref029","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.jbi.2017.09.014","article-title":"Bridging the gap: incorporating a semantic similarity measure for effectively mapping pubmed queries to documents","volume":"75","year":"2017","journal-title":"Journal of Biomedical Informatics"},{"key":"key2023012512322329100_ref030","first-page":"1277","article-title":"The trials and tribulations of working with structured data: -a study on information seeking behaviour","year":"2017"},{"key":"key2023012512322329100_ref031","first-page":"957","article-title":"From word embeddings to document distances","year":"2015"},{"issue":"5","key":"key2023012512322329100_ref032","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1093\/bioinformatics\/btm563","article-title":"Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r","volume":"24","year":"2008","journal-title":"Bioinformatics"},{"key":"key2023012512322329100_ref033","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1162\/tacl_a_00134","article-title":"Improving distributional similarity with lessons learned from word embeddings","volume":"3","year":"2015","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"key2023012512322329100_ref034","doi-asserted-by":"crossref","unstructured":"Maimon, O. and Rokach, L. (2009), \u201cIntroduction to knowledge discovery and data mining\u201d, in Data Mining and Knowledge Discovery Handbook, Springer, pp.\u00a01-15.","DOI":"10.1007\/978-0-387-09823-4_1"},{"issue":"1","key":"key2023012512322329100_ref035","doi-asserted-by":"crossref","first-page":"29","DOI":"10.7815\/ijorcs.21.2011.011","article-title":"A comparative study on distance measuring approaches for clustering","volume":"2","year":"2011","journal-title":"International Journal of Research in Computer Science"},{"key":"key2023012512322329100_ref036","first-page":"1532","article-title":"Glove: global vectors for word representation","year":"2014"},{"key":"key2023012512322329100_ref037","first-page":"13","article-title":"Understanding user goals in web search","year":"2004"},{"issue":"6","key":"key2023012512322329100_ref038","first-page":"1052","article-title":"Queries in authentic work tasks: the effects of task type and complexity","volume":"72","year":"2016","journal-title":"Journal of Documentation"},{"key":"key2023012512322329100_ref039","unstructured":"Sharifpour, R. (2022), \u201cPython code for processing and clustering a data search log\u201d, Zenodo, doi: 10.5281\/zenodo.6321621."},{"key":"key2023012512322329100_ref053","article-title":"Clarifying search: a user-interface framework for text searches","year":"1997"},{"key":"key2023012512322329100_ref041","first-page":"1245","article-title":"A taxonomy of queries for e-commerce search","year":"2018"},{"key":"key2023012512322329100_ref042","doi-asserted-by":"crossref","unstructured":"Tanioka, K. and Yadohisa, H. (2012), \u201cEffect of data standardization on the result of k-means clustering\u201d, in Challenges at the Interface of Data Analysis, Computer Science, and Optimization, Springer, pp.\u00a059-67.","DOI":"10.1007\/978-3-642-24466-7_7"},{"key":"key2023012512322329100_ref043","first-page":"110","article-title":"Subject knowledge, source of terms, and term selection in query expansion: an analytical study","year":"2002"},{"issue":"301","key":"key2023012512322329100_ref044","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","article-title":"Hierarchical grouping to optimize an objective function","volume":"58","year":"1963","journal-title":"Journal of the American Statistical Association"},{"key":"key2023012512322329100_ref045","first-page":"21","article-title":"Investigating behavioral variability in web search","year":"2007"},{"key":"key2023012512322329100_ref046","first-page":"159","article-title":"Studying the use of popular destinations to enhance web search interaction","year":"2007"},{"key":"key2023012512322329100_ref047","first-page":"132","article-title":"Characterizing the influence of domain expertise on web search behavior","year":"2009"},{"issue":"3","key":"key2023012512322329100_ref048","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/asi.10367","article-title":"The effects of domain knowledge on search tactic formulation","volume":"55","year":"2004","journal-title":"Journal of the American Society for Information Science and Technology"},{"issue":"3","key":"key2023012512322329100_ref049","first-page":"249","article-title":"Models in information behaviour research","volume":"35","year":"1999","journal-title":"Journal of Documentation"},{"key":"key2023012512322329100_ref050","unstructured":"Wu, M. and Benn, J. (2022), \u201c2019 search and interaction log from the data catalogue: research data Australia\u201d. doi: 10.5281\/zenodo.6133000."},{"key":"key2023012512322329100_ref051","first-page":"1998","article-title":"Topic mover's distance based document classification","year":"2017"}],"container-title":["Journal of Documentation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/JD-12-2021-0245\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/JD-12-2021-0245\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T22:35:40Z","timestamp":1753396540000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jd\/article\/79\/1\/66-85\/202966"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,27]]},"references-count":51,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,4,27]]},"published-print":{"date-parts":[[2023,1,10]]}},"alternative-id":["10.1108\/JD-12-2021-0245"],"URL":"https:\/\/doi.org\/10.1108\/jd-12-2021-0245","relation":{},"ISSN":["0022-0418"],"issn-type":[{"value":"0022-0418","type":"print"}],"subject":[],"published":{"date-parts":[[2022,4,27]]}}}