{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:16:57Z","timestamp":1775283417866,"version":"3.50.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,10,29]],"date-time":"2018-10-29T00:00:00Z","timestamp":1540771200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-1405688, IIS-1423002"],"award-info":[{"award-number":["CNS-1405688, IIS-1423002"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2018,12,31]]},"abstract":"<jats:p>\n            This work tackles the perennial problem of reproducible baselines in information retrieval research, focusing on bag-of-words ranking models. Although academic information retrieval researchers have a long history of building and sharing systems, they are primarily designed to facilitate the publication of research papers. As such, these systems are often incomplete, inflexible, poorly documented, difficult to use, and slow, particularly in the context of modern web-scale collections. Furthermore, the growing complexity of modern software ecosystems and the resource constraints most academic research groups operate under make maintaining open-source systems a constant struggle. However, except for a small number of companies (mostly commercial web search engines) that deploy custom infrastructure, Lucene has become the\n            <jats:italic>de facto<\/jats:italic>\n            platform in industry for building search applications. Lucene has an active developer base, a large audience of users, and diverse capabilities to work with heterogeneous collections at scale. However, it lacks systematic support for\n            <jats:italic>ad hoc<\/jats:italic>\n            experimentation using standard test collections. We describe Anserini, an information retrieval toolkit built on Lucene that fills this gap. Our goal is to simplify\n            <jats:italic>ad hoc<\/jats:italic>\n            experimentation and allow researchers to easily reproduce results with modern bag-of-words ranking models on diverse test collections. With Anserini, we demonstrate that Lucene provides a suitable framework for supporting information retrieval research. Experiments show that our system efficiently indexes large web collections, provides modern ranking models that are on par with research implementations in terms of effectiveness, and supports low-latency query evaluation to facilitate rapid experimentation\n          <\/jats:p>","DOI":"10.1145\/3239571","type":"journal-article","created":{"date-parts":[[2018,10,29]],"date-time":"2018-10-29T12:02:18Z","timestamp":1540814538000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":124,"title":["Anserini"],"prefix":"10.1145","volume":"10","author":[{"given":"Peilin","family":"Yang","sequence":"first","affiliation":[{"name":"University of Delaware, Newark, DE, USA"}]},{"given":"Hui","family":"Fang","sequence":"additional","affiliation":[{"name":"University of Delaware, Newark, DE, USA"}]},{"given":"Jimmy","family":"Lin","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]}],"member":"320","published-online":{"date-parts":[[2018,10,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.21236\/ADA460118"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2215676.2215681"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/2393955.2393958"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582416"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2888422.2888439"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1572153"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646031"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484132"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3084374"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3053408.3053421"},{"key":"e_1_2_1_11_1","volume-title":"OSWIR 2005 Workshop, Final Report.","author":"Beigbeder Michel","year":"2015"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 14th Text REtrieval Conference (TREC\u201905)","author":"Boldi Paolo","year":"2005"},{"key":"e_1_2_1_13_1","first-page":"85","article-title":"Implementation of the SMART Information Retrieval System","author":"Buckley Chris","year":"1985","journal-title":"Department of Computer Science TR"},{"key":"e_1_2_1_14_1","volume-title":"Overview of the TREC 2014 session track. In Proceedings of the 23rd Text REtrieval Conference (TREC\u201914)","author":"Carterette Ben","year":"2014"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 25--31","author":"Cartright Marc-Allen","year":"2012"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080819"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-016-9279-1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835490"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00018"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824114"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 4th Workshop on Evaluation Methods for Machine Learning at ICML.","author":"Drummond Chris","year":"2009"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2611178"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1076034.1076116"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964797.2964808"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2012.62"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882782"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/383952.383972"},{"key":"e_1_2_1_28_1","unstructured":"Hang Li. 2014. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan 8 Claypool Publishers.   Hang Li. 2014. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan 8 Claypool Publishers."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-30671-1_30"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 18th Text REtrieval Conference (TREC\u201909)","author":"Lin Jimmy","year":"2009"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808194.2809477"},{"key":"e_1_2_1_32_1","unstructured":"Jimmy Lin and Peilin Yang. 2018. Repeatability corner cases in document ranking: The impact of score ties. arXiv:1807.05798.  Jimmy Lin and Peilin Yang. 2018. Repeatability corner cases in document ranking: The impact of score ties. arXiv:1807.05798."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098011"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063576.2063584"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 60--63","author":"Macdonald Craig","year":"2012"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148246"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Jill P. Mesirov. 2010. Accessible reproducible research. Science 327 5964 (2010) 415--416.  Jill P. Mesirov. 2010. Accessible reproducible research. Science 327 5964 (2010) 415--416.","DOI":"10.1126\/science.1179653"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.05.001"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 13th Text REtrieval Conference (TREC\u201904)","author":"Metzler Donald"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509v1.  Bhaskar Mitra and Nick Craswell. 2017. Neural models for information retrieval. arXiv:1705.01509v1.","DOI":"10.1561\/1500000061"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609460"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-31865-1_37"},{"key":"e_1_2_1_43_1","volume-title":"Industry Track Keynote at the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201910).","author":"Pedersen Jan"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1213847"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291008"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Karthik Ram. 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8 7 (2013).  Karthik Ram. 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8 7 (2013).","DOI":"10.1186\/1751-0473-8-7"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 3rd Text REtrieval Conference (TREC\u201994)","author":"Robertson Stephen E.","year":"1994"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR\u201917)","author":"Sequiera Royal","year":"2017"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2433396.2433407"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2422256.2422269"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 40--47","author":"Trotman Andrew","year":"2012"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2682862.2682863"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR\u201917)","author":"Tu Zhucheng","year":"2017"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval. 64--67","author":"Turtle Howard"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009934"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2970398.2970415"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080721"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/1189702.1189712"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/984321.984322"},{"key":"e_1_2_1_60_1","unstructured":"Stefan B\u00fcttcher Charles L. A. Clarke and Gordon V. Cormack. 2010. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press.   Stefan B\u00fcttcher Charles L. A. Clarke and Gordon V. Cormack. 2010. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press."},{"key":"e_1_2_1_61_1","volume-title":"Proceedings of the 13th Text REtrieval Conference (TREC'04)","author":"Billerbeck Bodo","year":"2004"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3239571","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3239571","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:08:20Z","timestamp":1750208900000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3239571"}},"subtitle":["Reproducible Ranking Baselines Using Lucene"],"short-title":[],"issued":{"date-parts":[[2018,10,29]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,12,31]]}},"alternative-id":["10.1145\/3239571"],"URL":"https:\/\/doi.org\/10.1145\/3239571","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,10,29]]},"assertion":[{"value":"2017-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-10-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}