{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,3]],"date-time":"2022-04-03T10:11:30Z","timestamp":1648980690290},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2019,4,27]],"date-time":"2019-04-27T00:00:00Z","timestamp":1556323200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Spanish Government, project TUNER","award":["TIN2015-65308-C5-1-R"],"award-info":[{"award-number":["TIN2015-65308-C5-1-R"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012\u20132014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.<\/jats:p>","DOI":"10.1093\/llc\/fqy076","type":"journal-article","created":{"date-parts":[[2019,1,17]],"date-time":"2019-01-17T20:11:31Z","timestamp":1547755891000},"page":"471-484","source":"Crossref","is-referenced-by-count":1,"title":["Linguistic analysis of datasets for semantic textual similarity"],"prefix":"10.1093","volume":"35","author":[{"given":"Chunlin","family":"Wang","sequence":"first","affiliation":[{"name":"Artificial Solutions Iberia S.L., Barcelona"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Irene","family":"Castell\u00f3n","sequence":"additional","affiliation":[{"name":"Departamento de Filolog\u00eda Catalana y Ling\u00fc\u00edstica General, Universidad de Barcelona, Gran Via de les Corts Catalanes, Barcelona"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elisabet","family":"Comelles","sequence":"additional","affiliation":[{"name":"Departamento de Lenguas y Literaturas Modernas y de Estudios Ingleses, Universidad de Barcelona, Gran Via de les Corts Catalanes, Barcelona"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,4,27]]},"reference":[{"key":"2020052917400173000_fqy076-B1","doi-asserted-by":"crossref","first-page":"81","DOI":"10.3115\/v1\/S14-2010","article-title":"SemEval-2014 task 10: multilingual semantic textual similarity","author":"Agirre","year":"2014","journal-title":"Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014)"},{"key":"2020052917400173000_fqy076-B2","first-page":"32","volume-title":"Second Joint Conference on Lexical and Computational Semantics, Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity","author":"Agirre","year":"2013"},{"key":"2020052917400173000_fqy076-B3","first-page":"385","volume-title":"Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation","author":"Agirre","year":"2012"},{"key":"2020052917400173000_fqy076-B4","first-page":"86","volume-title":"Proceedings of the 17th international conference on Computational Linguistics-Volume 1","author":"Baker","year":"1998"},{"key":"2020052917400173000_fqy076-B5","first-page":"435","volume-title":"Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation","author":"B\u00e4r","year":"2012"},{"key":"2020052917400173000_fqy076-B6","volume-title":"Europe Media Monitor - System Description","author":"Best","year":"2005"},{"key":"2020052917400173000_fqy076-B7","doi-asserted-by":"crossref","first-page":"136","DOI":"10.3115\/1626355.1626373","article-title":"(Meta-) evaluation of machine translation","volume-title":"Proceedings of the Second Workshop on Statistical Machine Translation","author":"Callison-Burch","year":"2007"},{"key":"2020052917400173000_fqy076-B8","first-page":"70","article-title":"Further meta-evaluation of machine translation","volume-title":"In Proceedings of the Third Workshop on Statistical Machine Translation","author":"Callison-Burch","year":"2008"},{"key":"2020052917400173000_fqy076-B9","first-page":"190","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1","author":"Chen","year":"2011"},{"key":"2020052917400173000_fqy076-B10","volume-title":"Language Resources and Evaluation","author":"Comelles","year":"2018"},{"key":"2020052917400173000_fqy076-B11","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/11736790_9","volume-title":"Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment","author":"Dagan","year":"2006"},{"key":"2020052917400173000_fqy076-B12","first-page":"350","volume-title":"Proceedings of the 20th International Conference on Computational Linguistics","author":"Dolan","year":"2004"},{"key":"2020052917400173000_fqy076-B13","first-page":"162","volume-title":"Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Dreyer","year":"2012"},{"key":"2020052917400173000_fqy076-B14","doi-asserted-by":"crossref","first-page":"423","DOI":"10.7551\/mitpress\/7287.001.0001","volume-title":"WordNet: An electronic lexical database","author":"Fellbaum","year":"1998"},{"key":"2020052917400173000_fqy076-B15","first-page":"45","volume-title":"Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics","author":"Fernando","year":"2008"},{"key":"2020052917400173000_fqy076-B16","volume-title":"DEFT Phase 1 Narrative Text Source Data R1 LDC2013E19","author":"Garland","year":"2013"},{"key":"2020052917400173000_fqy076-B17","first-page":"57","volume-title":"Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers","author":"Hovy","year":"2006"},{"key":"2020052917400173000_fqy076-B18","first-page":"449","volume-title":"Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation","author":"Jimenez","year":"2012"},{"issue":"2","key":"2020052917400173000_fqy076-B19","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1037\/0033-295X.104.2.211","article-title":"A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge","volume":"104","author":"Landauer","year":"1997","journal-title":"Psychological Review"},{"key":"2020052917400173000_fqy076-B20","volume-title":"The Life, Letters and Labours of Francis Galton (Cambridge Library Collection - Darwin, Evolution and Genetics)","author":"Pearson","year":"2011"},{"key":"2020052917400173000_fqy076-B21","first-page":"441","volume-title":"Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation","author":"\u0160ari\u0107","year":"2012"},{"key":"2020052917400173000_fqy076-B22","first-page":"304","volume-title":"Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics","author":"Shinyama","year":"2006"},{"key":"2020052917400173000_fqy076-B23","doi-asserted-by":"crossref","first-page":"241","DOI":"10.3115\/v1\/S14-2039","article-title":"DLS@ CU: sentence similarity from word alignment","author":"Sultan","year":"2014","journal-title":"Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/dsh\/article-pdf\/35\/2\/471\/33324005\/fqy076.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/dsh\/article-pdf\/35\/2\/471\/33324005\/fqy076.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T22:33:10Z","timestamp":1590791590000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article\/35\/2\/471\/5481132"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,27]]},"references-count":23,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,4,27]]},"published-print":{"date-parts":[[2020,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/llc\/fqy076","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"value":"2055-7671","type":"print"},{"value":"2055-768X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,6]]},"published":{"date-parts":[[2019,4,27]]}}}