{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:07:15Z","timestamp":1755907635193,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","funder":[{"name":"FCT - Funda\u00e7\u00e3o para a Ci\u00eanncia e a Tecnologia","award":["SFRH\/BD\/151437\/2021"],"award-info":[{"award-number":["SFRH\/BD\/151437\/2021"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,7,18]]},"DOI":"10.1145\/3731120.3744593","type":"proceedings-article","created":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T13:34:06Z","timestamp":1752845646000},"page":"264-274","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Zero-Shot and Hybrid Strategies for Tetun Ad-Hoc Text Retrieval"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4392-2382","authenticated-orcid":false,"given":"Gabriel","family":"de Jesus","sequence":"first","affiliation":[{"name":"INESC TEC \/ University of Porto, Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8835-5845","authenticated-orcid":false,"given":"Siddharth A.K.","family":"Singh","sequence":"additional","affiliation":[{"name":"IRLab \/ University of Amsterdam, Amsterdam, Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2693-988X","authenticated-orcid":false,"given":"S\u00e9rgio","family":"Nunes","sequence":"additional","affiliation":[{"name":"INESC TEC \/ University of Porto, Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5970-880X","authenticated-orcid":false,"given":"Andrew","family":"Yates","sequence":"additional","affiliation":[{"name":"HLTCOE \/ Johns Hopkins University, Baltimore, Maryland, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,7,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582416"},{"key":"e_1_3_2_1_2_1","volume-title":"MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs.CL] https:\/\/arxiv.org\/abs\/1611.09268","author":"Bajaj Payal","year":"2018","unstructured":"Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2018. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs.CL] https:\/\/arxiv.org\/abs\/1611.09268"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345576"},{"key":"e_1_3_2_1_4_1","volume-title":"mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset. CoRR","author":"Bonifacio Luiz Henrique","year":"2021","unstructured":"Luiz Henrique Bonifacio, Israel Campiotti, Roberto A. Lotufo, and Rodrigo Nogueira. 2021. mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset. CoRR, Vol. abs\/2108.13897 (2021). arXiv:2108.13897 https:\/\/arxiv.org\/abs\/2108.13897"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-99736-6_7"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1572114"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of The Twenty-Eighth Text REtrieval Conference (TREC","author":"Craswell Nick","year":"2019","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2019. Overview of the TREC 2019 Deep Learning Track. In Proceedings of The Twenty-Eighth Text REtrieval Conference (TREC 2019). https:\/\/trec.nist.gov\/pubs\/trec28\/papers\/OVERVIEW.DL.pdf"},{"key":"e_1_3_2_1_8_1","volume-title":"Overview of the TREC 2023 Deep Learning Track. In The Thirty-Second Text REtrieval Conference Proceedings (TREC 2023","author":"Craswell Nick","year":"2023","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Hossein A. Rahmani, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, and Ian Soboroff. 2023. Overview of the TREC 2023 Deep Learning Track. In The Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), Gaithersburg, MD, USA, November 14-17, 2023, (NIST Special Publication, Vol. 500-xxx), Ian Soboroff and Angela Ellis, (Eds.). National Institute of Standards and Technology (NIST). https:\/\/trec.nist.gov\/pubs\/trec32\/papers\/Overview_deep.pdf"},{"key":"e_1_3_2_1_9_1","unstructured":"W. Bruce Croft Donald Metzler and Trevor Strohman. 2009. Search Engines - Information Retrieval in Practice. Pearson Education. http:\/\/www.search-engines-book.com\/"},{"key":"e_1_3_2_1_10_1","first-page":"19","volume-title":"Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024), co-located with the 10th International Conference on Online Publishing (SIGIR","volume":"3752","author":"de Jesus Gabriel","year":"2024","unstructured":"Gabriel de Jesus and S\u00e9rgio Nunes. 2024a. Exploring Large Language Models for Relevance Judgments in Tetun. In Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024), co-located with the 10th International Conference on Online Publishing (SIGIR 2024), C. Siro, M. Aliannejadi, H.A. Rahmani, N. Craswell, C.L.A. Clarke, G. Faggioli, B. Mitra, P. Thomas, and E. Yilmaz, (Eds.), Vol. 3752. Washington D.C., USA, 19-30. https:\/\/ceur-ws.org\/Vol-3752\/"},{"key":"e_1_3_2_1_11_1","first-page":"177","volume-title":"Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024","author":"de Jesus Gabriel","year":"2024","unstructured":"Gabriel de Jesus and S\u00e9rgio Nunes. 2024b. Labadain-30k: A Monolingual Tetun Document-Level Audited Dataset. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, Maite Melero, Sakriani Sakti, and Claudia Soria, (Eds.). ELRA and ICCL, Torino, Italia, 177-188. https:\/\/aclanthology.org\/2024.sigul-1.22"},{"key":"e_1_3_2_1_12_1","unstructured":"Gabriel de Jesus and S\u00e9rgio Nunes. 2025a. Establishing a Foundation for Tetun Text Ad-Hoc Retrieval: Stemming Indexing Retrieval and Ranking. arXiv:2412.11758 [cs.IR] https:\/\/arxiv.org\/abs\/2412.11758"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","unstructured":"Gabriel de Jesus and S\u00e9rgio Nunes. 2025b. Labadain-Avaliad\u00f3r: A Test Collection for Tetun Ad-hoc Text Retrieval Task [Dataset]. https:\/\/doi.org\/10.25747\/2k6s-e518","DOI":"10.25747\/2k6s-e518"},{"key":"e_1_3_2_1_14_1","first-page":"4368","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC\/COLING 2024","author":"de Jesus Gabriel","year":"2024","unstructured":"Gabriel de Jesus and S\u00e9rgio Sobral Nunes. 2024c. Data Collection Pipeline for Low-Resource Languages: A Case Study on Constructing a Tetun Text Corpus. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC\/COLING 2024, 20-25 May, 2024, Torino, Italy, Nicoletta Calzolari, Min-Yen Kan, V\u00e9ronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, (Eds.). ELRA and ICCL, 4368-4380. https:\/\/aclanthology.org\/2024.lrec-main.390"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.25747\/rfzx-m945"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/N19-1423"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000100"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2022.ACL-LONG.203"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-72113-8_10"},{"key":"e_1_3_2_1_20_1","unstructured":"Zuzana Greks\u00e1kov\u00e1. 2018. Tetun in Timor-Leste: The role of language contact in its development. Ph.D. Dissertation. Universidade de Coimbra Portugal. http:\/\/hdl.handle.net\/10316\/80665"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3486250"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1353\/ol.2019.0003"},{"key":"e_1_3_2_1_23_1","article-title":"Unsupervised Dense Information Retrieval with Contrastive","volume":"2022","author":"Izacard Gautier","year":"2022","unstructured":"Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. Unsupervised Dense Information Retrieval with Contrastive Learning. Trans. Mach. Learn. Res., Vol. 2022 (2022). https:\/\/openreview.net\/forum?id=jKN1pXi7b0","journal-title":"Learning. Trans. Mach. Learn. Res."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2410.21242"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.550"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401075"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3624918.3625330"},{"key":"e_1_3_2_1_28_1","volume-title":"Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach. CoRR","author":"Kuzi Saar","year":"2020","unstructured":"Saar Kuzi, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork. 2020. Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach. CoRR, Vol. abs\/2010.01195 (2020). arXiv:2010.01195 https:\/\/arxiv.org\/abs\/2010.01195"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-28244-7_33"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.ACL-LONG.746"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.2200\/S01123ED1V01Y202108HLT053"},{"key":"e_1_3_2_1_32_1","first-page":"4370","volume-title":"Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE","author":"Louis Antoine","year":"2025","unstructured":"Antoine Louis, Vageesh Kumar Saxena, Gijs van Dijck, and Gerasimos Spanakis. 2025. ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval. In Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert, (Eds.). Association for Computational Linguistics, 4370-4383. https:\/\/aclanthology.org\/2025.coling-main.295\/"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-99736-6_41"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3409256.3409829"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000061"},{"key":"e_1_3_2_1_36_1","unstructured":"Tri Nguyen Mir Rosenberg Xia Song Jianfeng Gao Saurabh Tiwary Rangan Majumder and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016) Barcelona Spain December 9 2016 (CEUR Workshop Proceedings Vol. 1773) Tarek Richard Besold Antoine Bordes Artur S. d'Avila Garcez and Greg Wayne (Eds.). CEUR-WS.org. https:\/\/ceur-ws.org\/Vol-1773\/CoCoNIPS_2016_paper9.pdf"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.FINDINGS-EMNLP.63"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-31865-1_37"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-EMNLP.1057"},{"key":"e_1_3_2_1_40_1","first-page":"21","volume-title":"Proceedings of the Workshop on Language Modeling and Information Retrieval, J. Callan, B. Croft, and J. Lafferty, (Eds.)","author":"Robertson Stephen","year":"2001","unstructured":"Stephen Robertson and Djoerd Hiemstra. 2001. Language models and probability of relevance. In Proceedings of the Workshop on Language Modeling and Information Retrieval, J. Callan, B. Croft, and J. Lafferty, (Eds.). Carnegie Mellon University, United States, 21-25."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2022.NAACL-MAIN.272"},{"key":"e_1_3_2_1_43_1","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021","author":"Thakur Nandan","year":"2021","unstructured":"Nandan Thakur, Nils Reimers, Andreas R\u00fcckl\u00e9, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung, (Eds.). https:\/\/datasets-benchmarks-proceedings.neurips.cc\/paper\/2021\/hash\/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract-round2.html"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1515\/multi-2017-0109"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.15144\/PL-528"},{"key":"e_1_3_2_1_46_1","volume-title":"Ricardo Sousa da Cunha, Andreia Rute da Silva Baptista, Alexandre Corte-Real de Ara\u00fajo, Benedita McCrorie Gra\u00e7a Moura","author":"Bacelar de Vasconcelos Pedro Carlos","year":"2011","unstructured":"Pedro Carlos Bacelar de Vasconcelos, Andreia Sofia Pinto Oliveira, Ricardo Sousa da Cunha, Andreia Rute da Silva Baptista, Alexandre Corte-Real de Ara\u00fajo, Benedita McCrorie Gra\u00e7a Moura, Bernardo Almeida, Cl\u00e1udio Ximenes, Fernando Conde Monteiro, Henrique Curado, et al., 2011. Constitui\u00e7 ao Anotada da Rep\u00fablica Democr\u00e1tica de Timor-Leste. http:\/\/hdl.handle.net\/10400.22\/4008"},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004","author":"Voorhees Ellen M.","year":"2004","unstructured":"Ellen M. Voorhees. 2004. Overview of the TREC 2004 Robust Track. In Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004, (NIST Special Publication, Vol. 500-261), Ellen M. Voorhees and Lori P. Buckland, (Eds.). National Institute of Standards and Technology (NIST). http:\/\/trec.nist.gov\/pubs\/trec13\/papers\/ROBUST.OVERVIEW.pdf"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3648471"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-DEMOS.6"},{"key":"e_1_3_2_1_50_1","volume-title":"Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In 9th International Conference on Learning Representations, ICLR 2021","author":"Xiong Lee","year":"2021","unstructured":"Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=zeFrfgyZln"},{"key":"e_1_3_2_1_51_1","volume-title":"Simple Applications of BERT for Ad Hoc Document Retrieval. CoRR","author":"Yang Wei","year":"2019","unstructured":"Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Simple Applications of BERT for Ad Hoc Document Retrieval. CoRR, Vol. abs\/1903.10972 (2019). arXiv:1903.10972 http:\/\/arxiv.org\/abs\/1903.10972"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.mrl-1.12"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613447"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1162\/TACL_A_00595"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.250"}],"event":{"name":"ICTIR '25: International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"],"location":"Padua Italy","acronym":"ICTIR '25"},"container-title":["Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3731120.3744593","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:17:45Z","timestamp":1755868665000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731120.3744593"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,18]]},"references-count":55,"alternative-id":["10.1145\/3731120.3744593","10.1145\/3731120"],"URL":"https:\/\/doi.org\/10.1145\/3731120.3744593","relation":{},"subject":[],"published":{"date-parts":[[2025,7,18]]},"assertion":[{"value":"2025-07-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}