{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T11:41:27Z","timestamp":1761824487385,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,3,27]]},"DOI":"10.1145\/3555776.3577715","type":"proceedings-article","created":{"date-parts":[[2023,6,7]],"date-time":"2023-06-07T17:16:29Z","timestamp":1686158189000},"page":"929-935","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["BERTEPro : A new Sentence Embedding Framework for the Education and Professional Training domain"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5176-5036","authenticated-orcid":false,"given":"Guillaume","family":"Lefebvre","sequence":"first","affiliation":[{"name":"LIRIS Lab, Universit\u00e9 Lyon 1, Lyon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6546-1567","authenticated-orcid":false,"given":"Haytham","family":"Elghazel","sequence":"additional","affiliation":[{"name":"LIRIS Lab, Universit\u00e9 Lyon 1, Lyon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6884-7855","authenticated-orcid":false,"given":"Th\u00e9odore","family":"Guillet","sequence":"additional","affiliation":[{"name":"Inokufu, Lyon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8797-5403","authenticated-orcid":false,"given":"Alexandre","family":"Aussem","sequence":"additional","affiliation":[{"name":"LIRIS Lab, Universit\u00e9 Lyon 1, Lyon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8349-6467","authenticated-orcid":false,"given":"Matthieu","family":"Sonnati","sequence":"additional","affiliation":[{"name":"Inokufu, Lyon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,6,7]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017","author":"Cer Daniel M.","year":"2017","unstructured":"Daniel M. Cer , Mona T. Diab , Eneko Agirre , I\u00f1igo Lopez-Gazpio , and Lucia Specia . 2017 . SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation . In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017 , Vancouver, Canada , August 3-4, 2017. 1--14. Daniel M. Cer, Mona T. Diab, Eneko Agirre, I\u00f1igo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017. 1--14."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1269"},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA , June 2-7, 2019, Volume 1 (Long and Short Papers). 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171--4186."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2205.12335"},{"key":"e_1_3_2_1_6_1","volume-title":"word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722","author":"Goldberg Yoav","year":"2014","unstructured":"Yoav Goldberg and Omer Levy . 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 ( 2014 ). Yoav Goldberg and Omer Levy. 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)."},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020","author":"Gururangan Suchin","year":"2020","unstructured":"Suchin Gururangan , Ana Marasovic , Swabha Swayamdipta , Kyle Lo , Iz Beltagy , Doug Downey , and Noah A. Smith . 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 , Online , July 5-10, 2020 . 8342--8360. Suchin Gururangan, Ana Marasovic, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don't Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. 8342--8360."},{"key":"e_1_3_2_1_8_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_9_1","volume-title":"Intelligent Data Engineering and Automated Learning - IDEAL","author":"Huertas-Garc\u00eda \u00c1lvaro","year":"2021","unstructured":"\u00c1lvaro Huertas-Garc\u00eda , Javier Huertas-Tato , Alejandro Mart\u00edn , and David Camacho . 2021. Countering Misinformation Through Semantic-Aware Multilingual Models . In Intelligent Data Engineering and Automated Learning - IDEAL 2021 , Hujun Yin, David Camacho , Peter Tino, Richard Allmendinger, Antonio J. Tall\u00f3n-Ballesteros, Ke Tang, Sung-Bae Cho, Paulo Novais, and Susana Nascimento (Eds.). Springer International Publishing , Cham, 312--323. \u00c1lvaro Huertas-Garc\u00eda, Javier Huertas-Tato, Alejandro Mart\u00edn, and David Camacho. 2021. Countering Misinformation Through Semantic-Aware Multilingual Models. In Intelligent Data Engineering and Automated Learning - IDEAL 2021, Hujun Yin, David Camacho, Peter Tino, Richard Allmendinger, Antonio J. Tall\u00f3n-Ballesteros, Ke Tang, Sung-Bae Cho, Paulo Novais, and Susana Nascimento (Eds.). Springer International Publishing, Cham, 312--323."},{"key":"e_1_3_2_1_10_1","volume-title":"A statistical interpretation of term specificity and its application in retrieval. Journal of documentation","author":"Jones Karen Sparck","year":"1972","unstructured":"Karen Sparck Jones . 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation ( 1972 ). Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation (1972)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","DOI":"10.1145\/3366423","volume-title":"Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020","author":"Le Hang","year":"2020","unstructured":"Hang Le , Lo\u00efc Vial , Jibril Frej , Vincent Segonne , Maximin Coavoux , Benjamin Lecouteux , Alexandre Allauzen , Beno\u00eet Crabb\u00e9 , Laurent Besacier , and Didier Schwab . 2020 . FlauBERT: Unsupervised Language Model Pre-training for French . In Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020 , Marseille, France , May 11-16, 2020. 2479--2490. Hang Le, Lo\u00efc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Beno\u00eet Crabb\u00e9, Laurent Besacier, and Didier Schwab. 2020. FlauBERT: Unsupervised Language Model Pre-training for French. In Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020. 2479--2490."},{"key":"e_1_3_2_1_12_1","volume-title":"Very deep transformers for neural machine translation. arXiv preprint arXiv:2008.07772","author":"Liu Xiaodong","year":"2020","unstructured":"Xiaodong Liu , Kevin Duh , Liyuan Liu , and Jianfeng Gao . 2020. Very deep transformers for neural machine translation. arXiv preprint arXiv:2008.07772 ( 2020 ). Xiaodong Liu, Kevin Duh, Liyuan Liu, and Jianfeng Gao. 2020. Very deep transformers for neural machine translation. arXiv preprint arXiv:2008.07772 (2020)."},{"key":"e_1_3_2_1_13_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019 . Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2663792.2663793"},{"key":"e_1_3_2_1_15_1","volume-title":"ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8129--8133","author":"Ma Rao","year":"2020","unstructured":"Rao Ma , Lesheng Jin , Qi Liu , Lu Chen , and Kai Yu . 2020 . Addressing the polysemy problem in language modeling with attentional multi-sense embeddings . In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8129--8133 . Rao Ma, Lesheng Jin, Qi Liu, Lu Chen, and Kai Yu. 2020. Addressing the polysemy problem in language modeling with attentional multi-sense embeddings. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8129--8133."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.3115\/1599081.1599147"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020","author":"Martin Louis","year":"2020","unstructured":"Louis Martin , Benjamin M\u00fcller , Pedro Javier Ortiz Su\u00e1rez , Yoann Dupont , Laurent Romary , \u00c9ric de la Clergerie, Djam\u00e9 Seddah, and Beno\u00eet Sagot. 2020. CamemBERT: a Tasty French Language Model . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 , Online , July 5-10, 2020 . 7203--7219. Louis Martin, Benjamin M\u00fcller, Pedro Javier Ortiz Su\u00e1rez, Yoann Dupont, Laurent Romary, \u00c9ric de la Clergerie, Djam\u00e9 Seddah, and Beno\u00eet Sagot. 2020. CamemBERT: a Tasty French Language Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. 7203--7219."},{"key":"e_1_3_2_1_18_1","volume-title":"1st International Conference on Learning Representations, ICLR","author":"Mikolov Tom\u00e1s","year":"2013","unstructured":"Tom\u00e1s Mikolov , Kai Chen , Greg Corrado , and Jeffrey Dean . 2013. Efficient Estimation of Word Representations in Vector Space . In 1st International Conference on Learning Representations, ICLR 2013 , Scottsdale, Arizona, USA , May 2-4, 2013, Workshop Track Proceedings . Tom\u00e1s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 746--751","author":"Mikolov Tom\u00e1\u0161","year":"2013","unstructured":"Tom\u00e1\u0161 Mikolov , Wen-tau Yih, and Geoffrey Zweig . 2013 . Linguistic regularities in continuous space word representations . In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 746--751 . Tom\u00e1\u0161 Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 746--751."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.03.091"},{"key":"e_1_3_2_1_21_1","volume-title":"6th Italian Conference on Computational Linguistics, CLiC-it","volume":"2481","author":"Polignano Marco","year":"2019","unstructured":"Marco Polignano , Pierpaolo Basile , Marco De Gemmis , Giovanni Semeraro , and Valerio Basile . 2019 . Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets . In 6th Italian Conference on Computational Linguistics, CLiC-it 2019, Vol. 2481 . CEUR, 1--6. Marco Polignano, Pierpaolo Basile, Marco De Gemmis, Giovanni Semeraro, and Valerio Basile. 2019. Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. In 6th Italian Conference on Computational Linguistics, CLiC-it 2019, Vol. 2481. CEUR, 1--6."},{"key":"e_1_3_2_1_22_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever etal 2018. Improving language understanding by generative pre-training. (2018).  Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et al. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5120\/ijca2016907355"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 87--96","author":"Reimers Nils","year":"2016","unstructured":"Nils Reimers , Philip Beyer , and Iryna Gurevych . 2016 . Task-oriented intrinsic evaluation of semantic textual similarity . In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 87--96 . Nils Reimers, Philip Beyer, and Iryna Gurevych. 2016. Task-oriented intrinsic evaluation of semantic textual similarity. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 87--96."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020","author":"Reimers Nils","year":"2020","unstructured":"Nils Reimers and Iryna Gurevych . 2020 . Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 , Online , November 16-20, 2020. 4512--4525. Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. 4512--4525."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Stephen Robertson Hugo Zaragoza etal 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends\u00ae in Information Retrieval 3 4 (2009) 333--389.  Stephen Robertson Hugo Zaragoza et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends \u00ae in Information Retrieval 3 4 (2009) 333--389.","DOI":"10.1561\/1500000019"},{"key":"e_1_3_2_1_28_1","volume-title":"Gottbert: a pure german language model. arXiv preprint arXiv:2012.02110","author":"Scheible Raphael","year":"2020","unstructured":"Raphael Scheible , Fabian Thomczyk , Patric Tippmann , Victor Jaravine , and Martin Boeker . 2020. Gottbert: a pure german language model. arXiv preprint arXiv:2012.02110 ( 2020 ). Raphael Scheible, Fabian Thomczyk, Patric Tippmann, Victor Jaravine, and Martin Boeker. 2020. Gottbert: a pure german language model. arXiv preprint arXiv:2012.02110 (2020)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","first-page":"13248","DOI":"10.1109\/ACCESS.2021.3052783","article-title":"A survey of the state-of-the-art models in neural abstractive text summarization","volume":"9","author":"Syed Ayesha Ayub","year":"2021","unstructured":"Ayesha Ayub Syed , Ford Lumban Gaol , and Tokuro Matsuo . 2021 . A survey of the state-of-the-art models in neural abstractive text summarization . IEEE Access 9 (2021), 13248 -- 13265 . Ayesha Ayub Syed, Ford Lumban Gaol, and Tokuro Matsuo. 2021. A survey of the state-of-the-art models in neural abstractive text summarization. IEEE Access 9 (2021), 13248--13265.","journal-title":"IEEE Access"},{"key":"e_1_3_2_1_30_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_31_1","volume-title":"GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019","author":"Wang Alex","year":"2019","unstructured":"Alex Wang , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel R. Bowman . 2019 . GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019 , New Orleans, LA, USA , May 6-9, 2019 . Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019."},{"key":"e_1_3_2_1_32_1","volume-title":"Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923","author":"Yin Wenpeng","year":"2017","unstructured":"Wenpeng Yin , Katharina Kann , Mo Yu , and Hinrich Sch\u00fctze . 2017. Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 ( 2017 ). Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Sch\u00fctze. 2017. Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 (2017)."}],"event":{"name":"SAC '23: 38th ACM\/SIGAPP Symposium on Applied Computing","sponsor":["SIGAPP ACM Special Interest Group on Applied Computing"],"location":"Tallinn Estonia","acronym":"SAC '23"},"container-title":["Proceedings of the 38th ACM\/SIGAPP Symposium on Applied Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555776.3577715","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3555776.3577715","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:08:24Z","timestamp":1750183704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555776.3577715"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,27]]},"references-count":32,"alternative-id":["10.1145\/3555776.3577715","10.1145\/3555776"],"URL":"https:\/\/doi.org\/10.1145\/3555776.3577715","relation":{},"subject":[],"published":{"date-parts":[[2023,3,27]]},"assertion":[{"value":"2023-06-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}