{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T19:13:57Z","timestamp":1768590837478,"version":"3.49.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,3,27]],"date-time":"2019-03-27T00:00:00Z","timestamp":1553644800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100011002","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61622208, 61732008, 61532011"],"award-info":[{"award-number":["61622208, 61732008, 61532011"]}],"id":[{"id":"10.13039\/501100011002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key Research and Development Program of China","award":["2018YFC0831700"],"award-info":[{"award-number":["2018YFC0831700"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2019,7,31]]},"abstract":"<jats:p>Image search engines differ significantly from general web search engines in the way of presenting search results. The difference leads to different interaction and examination behavior patterns, and therefore requires changes in evaluation methodologies. However, evaluation of image search still utilizes the methods for general web search. In particular, offline metrics are calculated based on coarse-fine topical relevance judgments with the assumption that users examine results in a sequential manner.<\/jats:p>\n          <jats:p>In this article, we investigate annotation methods via crowdsourcing for image search evaluation based on a lab-based user study. Using user satisfaction as the golden standard, we make several interesting findings. First, instead of item-based annotation, annotating relevance in a row-based way is more efficient without hurting performance. Second, besides topical relevance, image quality plays a crucial role when evaluating the image search results, and the importance of image quality changes with search intent. Third, compared to traditional four-level scales, the fine-grain annotation method outperforms significantly. To our best knowledge, our work is the first to systematically study how diverse factors in data annotation impact image search evaluation. Our results suggest different strategies for exploiting the crowdsourcing to get data annotated under different conditions.<\/jats:p>","DOI":"10.1145\/3309994","type":"journal-article","created":{"date-parts":[[2019,3,28]],"date-time":"2019-03-28T12:23:24Z","timestamp":1553775804000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["On Annotation Methodologies for Image Search Evaluation"],"prefix":"10.1145","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1727-8311","authenticated-orcid":false,"given":"Yunqiu","family":"Shao","sequence":"first","affiliation":[{"name":"Tsinghua University"}]},{"given":"Yiqun","family":"Liu","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Fan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Min","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Shaoping","family":"Ma","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]}],"member":"320","published-online":{"date-parts":[[2019,3,27]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277902"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation","volume":"15","author":"Alonso Omar","year":"2009","unstructured":"Omar Alonso and Stefano Mizzaro . 2009 . Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment . In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation , Vol. 15 . 16. Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation, Vol. 15. 16."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2012.01.004"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03658-3_40"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210027"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010037"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646033"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080804"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2552193"},{"key":"e_1_2_1_10_1","volume-title":"Overview of the TREC 2004 terabyte track. In Proceedings of the Text Retrieval Conference (TREC\u201904)","volume":"4","author":"Clarke Charles L. A.","year":"2004","unstructured":"Charles L. A. Clarke , Nick Craswell , and Ian Soboroff . 2004 . Overview of the TREC 2004 terabyte track. In Proceedings of the Text Retrieval Conference (TREC\u201904) , Vol. 4 . 74. Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. 2004. Overview of the TREC 2004 terabyte track. In Proceedings of the Text Retrieval Conference (TREC\u201904), Vol. 4. 74."},{"key":"e_1_2_1_11_1","volume-title":"Factors Determining the Performance of Indexing Systems","volume":"1","author":"Cleverdon C. W.","unstructured":"C. W. Cleverdon and E. M. Keen . 1966. Aslib--Cranfield research project . Factors Determining the Performance of Indexing Systems , Vol. 1 . College of Aeronautics. C. W. Cleverdon and E. M. Keen. 1966. Aslib--Cranfield research project. Factors Determining the Performance of Indexing Systems, Vol. 1. College of Aeronautics."},{"key":"e_1_2_1_12_1","volume-title":"Applied Multiple Regression\/Correlation Analysis for the Behavioral Sciences","author":"Cohen Jacob","unstructured":"Jacob Cohen and Patricia Cohen . 1983. Applied Multiple Regression\/Correlation Analysis for the Behavioral Sciences ( 2 nd ed.). Lawrence Erlbaum Associates . Jacob Cohen and Patricia Cohen. 1983. Applied Multiple Regression\/Correlation Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.","edition":"2"},{"key":"e_1_2_1_13_1","volume-title":"TREC 2014 Web Track Overview. Technical Report","author":"Collins-Thompson Kevyn","unstructured":"Kevyn Collins-Thompson , Craig Macdonald , Paul Bennett , Fernando Diaz , and Ellen M. Voorhees . 2015 . TREC 2014 Web Track Overview. Technical Report . Michigan University, Ann Arbor, MI. Kevyn Collins-Thompson, Craig Macdonald, Paul Bennett, Fernando Diaz, and Ellen M. Voorhees. 2015. TREC 2014 Web Track Overview. Technical Report. Michigan University, Ann Arbor, MI."},{"key":"e_1_2_1_14_1","volume-title":"The optimal number of response alternatives for a scale: A review. Journal of Marketing Research","author":"Eli P.","year":"1980","unstructured":"Eli P. Cox III. 1980. The optimal number of response alternatives for a scale: A review. Journal of Marketing Research ( 1980 ), 407--422. Eli P. Cox III. 1980. The optimal number of response alternatives for a scale: A review. Journal of Marketing Research (1980), 407--422."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159654"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835458"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2072298.2072308"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the ASIST Annual Meeting","volume":"36","author":"Goodrum Abby","year":"1999","unstructured":"Abby Goodrum and Amanda Spink . 1999 . Visual information seeking: A study of image queries on the World Wide Web . In Proceedings of the ASIST Annual Meeting , Vol. 36 . 665--74. Abby Goodrum and Amanda Spink. 1999. Visual information seeking: A study of image queries on the World Wide Web. In Proceedings of the ASIST Annual Meeting, Vol. 36. 665--74."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983846"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1080\/19312450709336664"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020165.3020188"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28997-2_16"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009057"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210033"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609631"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.2307\/2529310"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767721"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080795"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing.","author":"Maddalena Eddy","year":"2016","unstructured":"Eddy Maddalena , Marco Basaldella , Dario De Nart , Dante Degl\u2019Innocenti , Stefano Mizzaro , and Gianluca Demartini . 2016 . Crowdsourcing relevance assessments: The unexpected benefits of limiting the time to judge . In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing. Eddy Maddalena, Marco Basaldella, Dario De Nart, Dante Degl\u2019Innocenti, Stefano Mizzaro, and Gianluca Demartini. 2016. Crowdsourcing relevance assessments: The unexpected benefits of limiting the time to judge. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3002172"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911507"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM\u201911)","author":"McCreadie Richard","year":"2011","unstructured":"Richard McCreadie , Craig Macdonald , and Iadh Ounis . 2011 . Crowdsourcing blog track top news judgments at TREC . In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM\u201911) . 23--26. Richard McCreadie, Craig Macdonald, and Iadh Ounis. 2011. Crowdsourcing blog track top news judgments at TREC. In Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at the 4th ACM International Conference on Web Search and Data Mining (WSDM\u201911). 23--26."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2507665"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1416950.1416952"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1745-4557.1977.tb00942.x"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911532"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702527"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1108\/14684520510628864"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210052"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210182"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484031"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835542"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315930.1315946"},{"key":"e_1_2_1_45_1","volume-title":"Handbook of Parametric and Nonparametric Statistical Procedures","author":"Sheskin David J.","unstructured":"David J. Sheskin . 2003. Handbook of Parametric and Nonparametric Statistical Procedures . CRC Press, Boca Raton , FL. David J. Sheskin. 2003. Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton, FL."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348300"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488493"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564433"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749334"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1155\/2013\/619767"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/297427.297450"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767760"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526756"},{"key":"e_1_2_1_55_1","volume-title":"Harman","author":"Voorhees Ellen M.","year":"2005","unstructured":"Ellen M. Voorhees and Donna K . Harman . 2005 . TREC : Experiment and Evaluation in Information Retrieval. Vol. 1 . MIT Press , Cambridge, MA. Ellen M. Voorhees and Donna K. Harman. 2005. TREC: Experiment and Evaluation in Information Retrieval. Vol. 1. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159686"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080799"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080841"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210059"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-012-9206-z"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3309994","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3309994","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:36Z","timestamp":1750204416000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3309994"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,27]]},"references-count":59,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,7,31]]}},"alternative-id":["10.1145\/3309994"],"URL":"https:\/\/doi.org\/10.1145\/3309994","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,27]]},"assertion":[{"value":"2018-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-03-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}