{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T16:37:45Z","timestamp":1760978265339,"version":"3.41.2"},"reference-count":39,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2001,6,1]],"date-time":"2001-06-01T00:00:00Z","timestamp":991353600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2001,6,1]]},"abstract":"<jats:p>The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non\u2010relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept\u2010based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept\u2010based structures performed better than unexpanded queries or \u00d1natural language\u00d2 queries. Further, it was shown that highly relevant documents benefit essentially more from the concept\u2010based QE in ranking than marginally relevant documents.<\/jats:p>","DOI":"10.1108\/eum0000000007087","type":"journal-article","created":{"date-parts":[[2002,11,21]],"date-time":"2002-11-21T21:43:51Z","timestamp":1037915031000},"page":"358-376","source":"Crossref","is-referenced-by-count":17,"title":["Document text characteristics affect the ranking of the most relevant documents by expanded structured queries"],"prefix":"10.1108","volume":"57","author":[{"given":"Eero","family":"Sormunen","sequence":"first","affiliation":[]},{"given":"Jaana","family":"Kek\u00ffl\u00ffinen","sequence":"additional","affiliation":[]},{"given":"Jussi","family":"Koivisto","sequence":"additional","affiliation":[]},{"given":"Kalervo","family":"J\u00ffrvelin","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"key":"p_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199404)45:3<124::AID-ASI2>3.0.CO;2-8"},{"key":"p_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199210)43:9<602::AID-ASI3>3.0.CO;2-Q"},{"volume-title":"Information science: integration in perspective. Proceedings of COLIS 2, the Second International Conference on Conceptions of Library and Information Science","year":"1996","author":"Saracevic T.","key":"p_3"},{"key":"p_4","first-page":"313","volume-title":"ASIS '97: Proceedings of the 60th ASIS annual meeting. Medford, NJ: Information Today","author":"Saracevic T.","year":"1997"},{"key":"p_5","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(198803)39:2<92::AID-ASI4>3.0.CO;2-P"},{"key":"p_6","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(99)00072-2"},{"key":"p_7","doi-asserted-by":"publisher","DOI":"10.1145\/3166.3197"},{"key":"p_8","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630200204"},{"key":"p_9","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(198805)39:3<161::AID-ASI2>3.0.CO;2-0"},{"key":"p_10","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(98)00041-7"},{"volume-title":"Proceedings of TREC-7. http:\/\/trec.nist.gov\/pubs\/trec7\/papers\/vlc_ overview.pdf.gz> (visited","year":"2000","author":"Hawking D.","key":"p_11"},{"key":"p_12","unstructured":"Sormunen, E. A method for measuring wide range performance of Boolean queries in full-text databases. PhD thesis.Acta Electronica Universitatis Tamperensis. http:\/\/acta.uta.fi\/pdf\/951-44-4732-8.pdf.Tampere: University of Tampere, 2000."},{"key":"p_13","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345541"},{"issue":"3","key":"p_14","first-page":"160","volume":"45","author":"Ingwersen P.","year":"1995","journal-title":"Libri"},{"key":"p_15","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290978"},{"key":"p_16","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009983401464"},{"key":"p_17","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290958"},{"key":"p_18","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290957"},{"key":"p_19","unstructured":"Pirkola, A. Studies on linguistic problems and methods in text retrieval. PhD dissertation,Acta Universitatis Tamperensis672.Tampere: TAJU, 1999."},{"key":"p_20","unstructured":"Kek\u00e4l\u00e4inen, J. The effects of query complexity, expansion and structure on retrieval performance in probabilistic text retrieval. PhD dissertation,Acta Universitatis Tamperensis678.Tampere: TAJU, 1999."},{"volume-title":"Vapaatekstihaun tehokkuus ja siihen vaikuttavat tekij\u00e4t sanomalehtiaineistoa sis\u00e4lt\u00e4v\u00e4ss\u00e4 tekstikannassa [Free-text searching efficiency and factors affecting it in a newspaper article database]","year":"1994","author":"Sormunen E.","key":"p_21"},{"volume-title":"Information technology: the Fifth Text Retrieval Conference (TREC-5).","year":"1997","author":"Allan J.","key":"p_22"},{"key":"p_23","doi-asserted-by":"publisher","DOI":"10.1108\/eb026953"},{"key":"p_24","doi-asserted-by":"publisher","DOI":"10.1108\/eb026869"},{"key":"p_25","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199108)42:7<490::AID-ASI4>3.0.CO;2-V"},{"volume-title":"Online information retrieval","year":"1986","author":"Harter S.P.","key":"p_26"},{"volume-title":"Information retrieval today","year":"1993","author":"Lancaster F.W.","key":"p_27"},{"key":"p_28","doi-asserted-by":"publisher","DOI":"10.1145\/215206.215352"},{"key":"p_29","volume-title":"Practical nonparametric statistics","author":"Conover W.J.","year":"1980","edition":"2"},{"volume-title":"Nonparametric statistics for the behavioral sciences","year":"1998","author":"Siegel S.","key":"p_30"},{"key":"p_31","unstructured":"Turtle, H.R. Inference networks for document retrieval. PhD dissertation,Computer and Information Science Department, Universityof Massachusetts.1992. (COINS Technical Report 90-92)"},{"key":"p_32","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199505)46:4<272::AID-ASI4>3.0.CO;2-T"},{"key":"p_33","doi-asserted-by":"publisher","DOI":"10.1145\/160688.160693"},{"key":"p_34","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-2099-5_31"},{"key":"p_35","doi-asserted-by":"publisher","DOI":"10.1145\/258525.258561"},{"key":"p_36","doi-asserted-by":"publisher","DOI":"10.1145\/321439.321441"},{"volume-title":"Information technology: the Sixth Text Retrieval Conference (TREC-6).","year":"1997","author":"Hawking D.","key":"p_37"},{"key":"p_38","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290995"},{"key":"p_39","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1145\/345508.345545","volume-title":"Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"J\u00e4rvelin K.","year":"2000"}],"container-title":["Journal of Documentation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/EUM0000000007087\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/EUM0000000007087\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T00:52:31Z","timestamp":1753404751000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jd\/article\/57\/3\/358-376\/199432"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2001,6,1]]},"references-count":39,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2001,6,1]]}},"alternative-id":["10.1108\/EUM0000000007087"],"URL":"https:\/\/doi.org\/10.1108\/eum0000000007087","relation":{},"ISSN":["0022-0418"],"issn-type":[{"type":"print","value":"0022-0418"}],"subject":[],"published":{"date-parts":[[2001,6,1]]}}}