{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T17:38:32Z","timestamp":1740159512452,"version":"3.37.3"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,2,7]],"date-time":"2020-02-07T00:00:00Z","timestamp":1581033600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,2,7]],"date-time":"2020-02-07T00:00:00Z","timestamp":1581033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"GESIS \u2013 Leibniz-Institut f\u00fcr Sozialwissenschaften e.V."}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Datenbank Spektrum"],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Academic search systems aid users in finding information covering specific topics of scientific interest and have evolved from early catalog-based library systems to modern web-scale systems. However, evaluating the performance of the underlying retrieval approaches remains a\u00a0challenge. An increasing amount of requirements for producing accurate retrieval results have to be considered, e.g., close integration of the system\u2019s users. Due to these requirements, small to mid-size academic search systems cannot evaluate their retrieval system in-house. Evaluation infrastructures for shared tasks alleviate this situation. They allow researchers to experiment with retrieval approaches in specific search and recommendation scenarios without building their own infrastructure. In this paper, we elaborate on the benefits and shortcomings of four state-of-the-art evaluation infrastructures on search and recommendation tasks concerning the following requirements: support for online and offline evaluations, domain specificity of shared tasks, and reproducibility of experiments and results. In addition, we introduce an evaluation infrastructure concept design aiming at reducing the shortcomings in shared tasks for search and recommender systems.<\/jats:p>","DOI":"10.1007\/s13222-020-00335-x","type":"journal-article","created":{"date-parts":[[2020,2,7]],"date-time":"2020-02-07T11:04:05Z","timestamp":1581073445000},"page":"29-36","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Evaluation Infrastructures for Academic Shared Tasks"],"prefix":"10.1007","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5441-7640","authenticated-orcid":false,"given":"Johann","family":"Schaible","sequence":"first","affiliation":[]},{"given":"Timo","family":"Breuer","sequence":"additional","affiliation":[]},{"given":"Narges","family":"Tavakolpoursaleh","sequence":"additional","affiliation":[]},{"given":"Bernd","family":"M\u00fcller","sequence":"additional","affiliation":[]},{"given":"Benjamin","family":"Wolff","sequence":"additional","affiliation":[]},{"given":"Philipp","family":"Schaer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,2,7]]},"reference":[{"key":"335_CR1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661962","volume-title":"Head first: living labs for ad-hoc search evaluation","author":"K Balog","year":"2014","unstructured":"Balog K, Kelly L, Schuth A (2014) Head first: living labs for ad-hoc search evaluation. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, New York"},{"key":"335_CR2","volume-title":"Overview of the trec 2016 open search track","author":"K Balog","year":"2016","unstructured":"Balog K, Schuth A, Dekker P, Schaer P, Tavakolpoursaleh N, Chuang PY (2016) Overview of the trec 2016 open search track. Proceedings of the 25th Text REtrieval Conference 2016. Gaithersburg, NIST"},{"key":"335_CR3","doi-asserted-by":"publisher","DOI":"10.1145\/2532508.2532511","volume-title":"A\u00a0comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation","author":"J Beel","year":"2013","unstructured":"Beel J, Genzmehr M, Langer S, N\u00fcrnberger A, Gipp B (2013) A\u00a0comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. Proceedings of the international workshop on reproducibility and replication in recommender systems evaluation. ACM, New York"},{"key":"335_CR4","volume-title":"STELLA: towards a\u00a0framework for the reproducibility of online search experiments","author":"T Breuer","year":"2019","unstructured":"Breuer T, Schaer P, Tavakolpoursaleh N, Schaible J, Wolff B, M\u00fcller B (2019) STELLA: towards a\u00a0framework for the reproducibility of online search experiments. Proceedings of the Open-Source IR Replicability Challenge co-located with SIGIR, OSIRRC@SIGIR."},{"key":"335_CR5","doi-asserted-by":"publisher","DOI":"10.1145\/2637002.2637028","volume-title":"Shedding light on a\u00a0living lab: The clef newsreel open recommendation platform","author":"T Brodt","year":"2014","unstructured":"Brodt T, Hopfgartner F (2014) Shedding light on a\u00a0living lab: The clef newsreel open recommendation platform. IIiX\u201914: Proceedings of the Information Interaction in Context Conference. ACM, New York"},{"key":"335_CR6","volume-title":"Are we really making much progress? a\u00a0worrying analysis of recent neural recommendation approaches","author":"MF Dacrema","year":"2019","unstructured":"Dacrema MF, Cremonesi P, Jannach D (2019) Are we really making much progress? a\u00a0worrying analysis of recent neural recommendation approaches. Proceedings of the 13th ACM Conference on Recommender Systems, RecSys \u201919. ACM, New York"},{"issue":"1","key":"335_CR7","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1145\/2964797.2964808","volume":"50","author":"N Ferro","year":"2016","unstructured":"Ferro N, Fuhr N, J\u00e4rvelin K, Kando N, Lippold M, Zobel J (2016) Increasing reproducibility in IR: Findings from the dagstuhl seminar on \u201creproducibility of data-oriented experiments in e\u2011science\u201d. ACM SIGIR Forum 50(1):68\u201382","journal-title":"ACM SIGIR Forum"},{"key":"335_CR8","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-22948-1","volume-title":"Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF","author":"N Ferro","year":"2019","unstructured":"Ferro N, Peters C (2019) From multilingual to multimodal: The evolution of CLEF over two decades. In: Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF"},{"key":"335_CR9","volume-title":"Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF","author":"N Fuhr","year":"2019","unstructured":"Fuhr N (2019) Reproducibility and validity in CLEF. In: Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF"},{"key":"335_CR10","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348501","volume-title":"Ousting ivory tower research: Towards a\u00a0web framework for providing experiments as a\u00a0service","author":"T Gollub","year":"2012","unstructured":"Gollub T, Stein B, Burrows S (2012) Ousting ivory tower research: Towards a\u00a0web framework for providing experiments as a\u00a0service. Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR \u201912. ACM, New York"},{"key":"335_CR11","volume-title":"TIRA: configuring, executing, and disseminating information retrieval experiments","author":"T Gollub","year":"2012","unstructured":"Gollub T, Stein B, Burrows S, Hoppe D (2012) TIRA: configuring, executing, and disseminating information retrieval experiments. 9th International Workshop on Text-based Information Retrieval (TIR 2012) at DEXA. IEEE, Los Alamitos"},{"issue":"5","key":"335_CR12","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1002\/asi.24165","volume":"70","author":"K Gregory","year":"2019","unstructured":"Gregory K, Groth P, Cousijn H, Scharnhorst A, Wyatt S (2019) Searching data: a\u00a0review of observational data retrieval practices in selected disciplines. J\u00a0Assn Inf Sci Tec 70(5):419\u2013432","journal-title":"J Assn Inf Sci Tec"},{"key":"335_CR13","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1007\/978-1-4899-7637-6_8","volume-title":"Recommender systems handbook","author":"A Gunawardana","year":"2015","unstructured":"Gunawardana A, Shani G (2015) Evaluating recommender systems. In: Recommender systems handbook. Springer, Heidelberg, Berlin, New York, pp 265\u2013308"},{"issue":"1","key":"335_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/1500000051","volume":"10","author":"K Hofmann","year":"2016","unstructured":"Hofmann K, Li L, Radlinski F et al (2016) Online evaluation for information retrieval. Found Trends Inf Retr 10(1):1\u2013117","journal-title":"Found Trends Inf Retr"},{"issue":"2","key":"335_CR15","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1145\/2888422.2888443","volume":"49","author":"F Hopfgartner","year":"2015","unstructured":"Hopfgartner F, Brodt T, Seiler J, Kille B, Lommatzsch A, Larson M, Turrin R, Ser\u00e9ny A (2015) Benchmarking news recommendations: the CLEF newsreel use case. ACM SIGIR Forum 49(2):129\u2013136","journal-title":"ACM SIGIR Forum"},{"issue":"1","key":"335_CR16","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1145\/2795403.2795416","volume":"49","author":"F Hopfgartner","year":"2015","unstructured":"Hopfgartner F, Hanbury A, M\u00fcller H, Kando N, Mercer S, Kalpathy-Cramer J, Potthast M, Gollub T, Krithara A, Lin J, Balog K, Eggel I (2015) Report on the evaluation-as-a-service (eaas) expert workshop. ACM SIGIR Forum 49(1):57\u201365","journal-title":"ACM SIGIR Forum"},{"issue":"13","key":"335_CR17","first-page":"1","volume":"10","author":"R Jagerman","year":"2018","unstructured":"Jagerman R, Balog K, de Rijke M (2018) Opensearch: lessons learned from an online evaluation campaign. J\u00a0Data Inf Qual 10(13):1\u201315","journal-title":"J Data Inf Qual"},{"issue":"5","key":"335_CR18","doi-asserted-by":"publisher","first-page":"456","DOI":"10.1007\/s10791-017-9308-8","volume":"20","author":"S Karanam","year":"2017","unstructured":"Karanam S, Jorge-Botana G, Olmos R, van Oostendorp H (2017) The role of domain knowledge in cognitive modeling of information search. Inf Retr\u00a0J 20(5):456\u2013479","journal-title":"Inf Retr J"},{"key":"335_CR19","volume-title":"Working notes Proceedings of the MediaEval workshop","author":"A Lommatzsch","year":"2018","unstructured":"Lommatzsch A, Kille B, Hopfgartner F, Ramming L (2018) NewsREEL multimedia at mediaeval 2018: news recommendation with image and text content. In: Working notes Proceedings of the MediaEval workshop"},{"issue":"2","key":"335_CR20","doi-asserted-by":"publisher","first-page":"336","DOI":"10.1002\/asi.21669","volume":"63","author":"X Niu","year":"2012","unstructured":"Niu X, Hemminger BM (2012) A\u00a0study of factors that affect the information-seeking behavior of academic scientists. J\u00a0Am Soc Inf Sci 63(2):336\u2013353","journal-title":"J Am Soc Inf Sci"},{"key":"335_CR21","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1007\/978-1-4939-8561-6_15","volume":"1807","author":"S Peng","year":"2018","unstructured":"Peng S, Mamitsuka H, Zhu S (2018) MeSHLabeler and DeepMeSH: recent progress in large-scale MeSH indexing. Methods Mol Biol 1807:203\u2013209","journal-title":"Methods Mol Biol"},{"issue":"8","key":"335_CR22","doi-asserted-by":"publisher","first-page":"1883","DOI":"10.1002\/asi.23502","volume":"67","author":"S Pontis","year":"2016","unstructured":"Pontis S, Kefalidou G, Blandford A, Forth J, Makri S, Sharples S, Wiggins G, Woods M (2016) Academics\u2019 responses to encountered information: context matters. J\u00a0Assn Inf Sci Tec 67(8):1883\u20131903","journal-title":"J Assn Inf Sci Tec"},{"key":"335_CR23","volume-title":"Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF","author":"M Potthast","year":"2019","unstructured":"Potthast M, Gollub T, Wiegmann M, Stein B (2019) TIRA integrated research architecture. In: Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF"},{"key":"335_CR24","volume-title":"Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF","author":"P Rosso","year":"2019","unstructured":"Rosso P, Potthast M, Stein B, Stamatatos E, Pardo FMR, Daelemans W (2019) Evolution of the PAN lab on digital text forensics. In: Information retrieval evaluation in a\u00a0changing world \u2013 lessons learned from 20 years of CLEF"},{"key":"335_CR25","series-title":"Lecture notes in computer science","volume-title":"Advances in information retrieval \u2013 42nd European Conference on IR Research, ECIR 2020","author":"P Schaer","year":"2020","unstructured":"Schaer P, Schaible J, M\u00fcller B (2020) Living labs for academic search at clef 2020. In: Advances in information retrieval \u2013 42nd European Conference on IR Research, ECIR 2020. Lecture notes in computer science. Springer, Heidelberg, Berlin, New York"},{"key":"335_CR26","volume-title":"Using word embeddings for recommending datasets based on scientific publications","author":"N Tavakolpoursaleh","year":"2019","unstructured":"Tavakolpoursaleh N, Schaible J, Dietze S (2019) Using word embeddings for recommending datasets based on scientific publications. Proceedings of the Conference on \u201cLernen, Wissen, Daten, Analysen\u201d, Berlin"},{"issue":"138","key":"335_CR27","first-page":"1","volume":"16","author":"G Tsatsaronis","year":"2015","unstructured":"Tsatsaronis G, Balikas G, Malakasiotis P, Partalas I, Zschunke M, Alvers MR, Weissenborn D, Krithara A, Petridis S, Polychronopoulos D, Almirantis Y, Pavlopoulos J, Baskiotis N, Gallinari P, Arti\u00e8res T, Ngomo AN, Heino N, Gaussier \u00c9, Barrio-Alvers L, Schroeder M, Androutsopoulos I, Paliouras G (2015) An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16(138):1\u201328","journal-title":"BMC Bioinformatics"}],"container-title":["Datenbank-Spektrum"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-020-00335-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s13222-020-00335-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-020-00335-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,8]],"date-time":"2021-02-08T22:24:24Z","timestamp":1612823064000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s13222-020-00335-x"}},"subtitle":["Requirements and Concept Design for Search and Recommendation Scenarios"],"short-title":[],"issued":{"date-parts":[[2020,2,7]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["335"],"URL":"https:\/\/doi.org\/10.1007\/s13222-020-00335-x","relation":{},"ISSN":["1618-2162","1610-1995"],"issn-type":[{"type":"print","value":"1618-2162"},{"type":"electronic","value":"1610-1995"}],"subject":[],"published":{"date-parts":[[2020,2,7]]},"assertion":[{"value":"15 October 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 January 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}