{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T18:05:32Z","timestamp":1768586732616,"version":"3.49.0"},"reference-count":60,"publisher":"Wiley","issue":"2","license":[{"start":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T00:00:00Z","timestamp":1639353600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["01IS18036A"],"award-info":[{"award-number":["01IS18036A"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["BO3139\/4\u20103"],"award-info":[{"award-number":["BO3139\/4\u20103"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["BO3139\/6\u20102"],"award-info":[{"award-number":["BO3139\/6\u20102"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["BO3139\/7\u20101"],"award-info":[{"award-number":["BO3139\/7\u20101"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["wires.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["WIREs Data Min &amp; Knowl"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over\u2010optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.<\/jats:p><jats:p>This article is categorized under:<jats:list list-type=\"simple\">\n<jats:list-item><jats:p>Technologies &gt; Visualization<\/jats:p><\/jats:list-item>\n<jats:list-item><jats:p>Technologies &gt; Data Preprocessing<\/jats:p><\/jats:list-item>\n<jats:list-item><jats:p>Technologies &gt; Structure Discovery and Clustering<\/jats:p><\/jats:list-item>\n<\/jats:list><\/jats:p>","DOI":"10.1002\/widm.1441","type":"journal-article","created":{"date-parts":[[2021,12,14]],"date-time":"2021-12-14T02:46:01Z","timestamp":1639449961000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Over\u2010optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results"],"prefix":"10.1002","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2425-7858","authenticated-orcid":false,"given":"Christina","family":"Nie\u00dfl","sequence":"first","affiliation":[{"name":"Institute for Medical Information Processing, Biometry and Epidemiology Ludwig Maximilians University Munich  Munich Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4893-5812","authenticated-orcid":false,"given":"Moritz","family":"Herrmann","sequence":"additional","affiliation":[{"name":"Department of Statistics Ludwig Maximilians University Munich  Munich Germany"}]},{"given":"Chiara","family":"Wiedemann","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry and Epidemiology Ludwig Maximilians University Munich  Munich Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5324-5966","authenticated-orcid":false,"given":"Giuseppe","family":"Casalicchio","sequence":"additional","affiliation":[{"name":"Department of Statistics Ludwig Maximilians University Munich  Munich Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2729-0947","authenticated-orcid":false,"given":"Anne\u2010Laure","family":"Boulesteix","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry and Epidemiology Ludwig Maximilians University Munich  Munich Germany"}]}],"member":"311","published-online":{"date-parts":[[2021,12,13]]},"reference":[{"key":"e_1_2_12_2_1","doi-asserted-by":"publisher","DOI":"10.1080\/00031305.2018.1543137"},{"key":"e_1_2_12_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00180-013-0420-y"},{"key":"e_1_2_12_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxy006"},{"key":"e_1_2_12_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csbj.2020.11.049"},{"key":"e_1_2_12_6_1","volume-title":"Modern multidimensional scaling: Theory and applications","author":"Borg I.","year":"2005"},{"key":"e_1_2_12_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31848-1"},{"key":"e_1_2_12_8_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1004191"},{"key":"e_1_2_12_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/bimj.201700129"},{"key":"e_1_2_12_10_1","doi-asserted-by":"publisher","DOI":"10.1080\/00031305.2015.1005128"},{"key":"e_1_2_12_11_1","doi-asserted-by":"publisher","DOI":"10.1111\/1740-9713.01444"},{"key":"e_1_2_12_12_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0061562"},{"key":"e_1_2_12_13_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12874-017-0417-2"},{"key":"e_1_2_12_14_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-021-02365-4"},{"key":"e_1_2_12_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11336-001-0908-1"},{"key":"e_1_2_12_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cortex.2012.12.016"},{"key":"e_1_2_12_17_1","volume-title":"A theory of data","author":"Coombs C. H.","year":"1964"},{"key":"e_1_2_12_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41060-019-00185-1"},{"key":"e_1_2_12_19_1","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v031.i04"},{"key":"e_1_2_12_20_1","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Dem\u0161ar J.","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_12_21_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300017197"},{"key":"e_1_2_12_22_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-017-1486-2"},{"key":"e_1_2_12_23_1","first-page":"5","article-title":"Domain\u2010based benchmark experiments: Exploratory and inferential analysis","volume":"41","author":"Eugster M. J. A.","year":"2012","journal-title":"Austrian Journal of Statistics"},{"key":"e_1_2_12_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2013.08.007"},{"key":"e_1_2_12_25_1","first-page":"3133","article-title":"Do we need hundreds of classifiers to solve real world classification problems?","volume":"15","author":"Fern\u00e1ndez\u2010Delgado M.","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_12_26_1","doi-asserted-by":"publisher","DOI":"10.7717\/peerj.6160"},{"key":"e_1_2_12_27_1","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jproteome.5b00852"},{"key":"e_1_2_12_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-0258(19990915\/30)18:17\/18<2529::AID-SIM274>3.0.CO;2-5"},{"key":"e_1_2_12_29_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pbio.1002106"},{"key":"e_1_2_12_30_1","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbaa167"},{"key":"e_1_2_12_31_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.201925"},{"key":"e_1_2_12_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-70981-7_19"},{"key":"e_1_2_12_33_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186005X59630"},{"key":"e_1_2_12_34_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pmed.0020124"},{"key":"e_1_2_12_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2014.02.010"},{"key":"e_1_2_12_36_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btq323"},{"key":"e_1_2_12_37_1","doi-asserted-by":"publisher","DOI":"10.1177\/0956797611430953"},{"key":"e_1_2_12_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2016.05.001"},{"key":"e_1_2_12_39_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-019-1887-9"},{"key":"e_1_2_12_40_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btaa191"},{"key":"e_1_2_12_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2012.09.022"},{"key":"e_1_2_12_42_1","first-page":"772","article-title":"Goodness\u2010of\u2010fit assessment in multidimensional scaling and unfolding","volume":"51","author":"Mair P.","year":"2016","journal-title":"Multivariate Behavioral Research"},{"key":"e_1_2_12_43_1","article-title":"More on multidimensional scaling and unfolding in R: smacof version 2","author":"Mair P.","year":"2021","journal-title":"Journal of Statistical Software"},{"key":"e_1_2_12_44_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-09406-4"},{"key":"e_1_2_12_45_1","doi-asserted-by":"publisher","DOI":"10.1162\/EVCO_a_00134"},{"key":"e_1_2_12_46_1","doi-asserted-by":"publisher","DOI":"10.1002\/sim.8086"},{"key":"e_1_2_12_47_1","doi-asserted-by":"publisher","DOI":"10.1038\/msb.2011.70"},{"key":"e_1_2_12_48_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-015-0610-4"},{"key":"e_1_2_12_49_1","doi-asserted-by":"publisher","DOI":"10.1038\/526182a"},{"key":"e_1_2_12_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2016.12.023"},{"key":"e_1_2_12_51_1","doi-asserted-by":"crossref","unstructured":"Orzechowski P. La Cava W. &Moore J. H.(2018).Where are we now? A large benchmark study of recent symbolic regression methods. InProceedings of the Genetic and Evolutionary Computation Conference GECCO '18 Association for Computing Machinery New York NY USA (pp. 1183\u20131190).","DOI":"10.1145\/3205455.3205539"},{"key":"e_1_2_12_52_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-019-1846-5"},{"key":"e_1_2_12_53_1","doi-asserted-by":"publisher","DOI":"10.1177\/0956797611417632"},{"key":"e_1_2_12_54_1","doi-asserted-by":"publisher","DOI":"10.1002\/sim.4154"},{"key":"e_1_2_12_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331449"},{"key":"e_1_2_12_56_1","doi-asserted-by":"publisher","DOI":"10.1177\/1745691612463078"},{"key":"e_1_2_12_57_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-019-1738-8"},{"key":"e_1_2_12_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-0123-9_3"},{"key":"e_1_2_12_59_1","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbaa321"},{"key":"e_1_2_12_60_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp605"},{"key":"e_1_2_12_61_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1330"}],"container-title":["WIREs Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/widm.1441","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/widm.1441","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/wires.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/widm.1441","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T02:36:02Z","timestamp":1692844562000},"score":1,"resource":{"primary":{"URL":"https:\/\/wires.onlinelibrary.wiley.com\/doi\/10.1002\/widm.1441"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,13]]},"references-count":60,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["10.1002\/widm.1441"],"URL":"https:\/\/doi.org\/10.1002\/widm.1441","archive":["Portico"],"relation":{},"ISSN":["1942-4787","1942-4795"],"issn-type":[{"value":"1942-4787","type":"print"},{"value":"1942-4795","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,13]]},"assertion":[{"value":"2021-06-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-06","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e1441"}}