{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T14:49:51Z","timestamp":1776350991915,"version":"3.51.2"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,11,30]],"date-time":"2021-11-30T00:00:00Z","timestamp":1638230400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"German Research Foundation","doi-asserted-by":"crossref","award":["139943784 DFG"],"award-info":[{"award-number":["139943784 DFG"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM\/IMS Trans. Data Sci."],"published-print":{"date-parts":[[2021,11,30]]},"abstract":"<jats:p>\n                    During the past 15 years, automatic text scaling has become one of the key tools of the Text as Data community in political science. Prominent text-scaling algorithms, however, rely on the assumption that latent positions can be captured just by leveraging the information about word frequencies in documents under study. We challenge this traditional view and present a new, semantically aware text-scaling algorithm,\n                    <jats:italic toggle=\"yes\">SemScale<\/jats:italic>\n                    , which combines recent developments in the area of computational linguistics with unsupervised graph-based clustering. We conduct an extensive quantitative analysis over a collection of speeches from the European Parliament in five different languages and from two different legislative terms, and we show that a scaling approach relying on semantic document representations is often better at capturing known underlying political dimensions than the established frequency-based (i.e., symbolic) scaling method. We further validate our findings through a series of experiments focused on text preprocessing and feature selection, document representation, scaling of party manifestos, and a supervised extension of our algorithm. To catalyze further research on this new branch of text-scaling methods, we release a Python implementation of\n                    <jats:italic toggle=\"yes\">SemScale<\/jats:italic>\n                    with all included datasets and evaluation procedures.\n                  <\/jats:p>","DOI":"10.1145\/3485666","type":"journal-article","created":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T09:52:21Z","timestamp":1652781141000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Political Text Scaling Meets Computational Semantics"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2484-4331","authenticated-orcid":false,"given":"Federico","family":"Nanni","sequence":"first","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1301-6314","authenticated-orcid":false,"given":"Goran","family":"Glava\u0161","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9615-6389","authenticated-orcid":false,"given":"Ines","family":"Rehbein","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7484-2049","authenticated-orcid":false,"given":"Simone Paolo","family":"Ponzetto","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0209-3859","authenticated-orcid":false,"given":"Heiner","family":"Stuckenschmidt","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, Mannheim, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,17]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.3115\/1609067.1609070"},{"key":"e_1_3_2_3_2","unstructured":"Ryan Bakker Liesbet Hooghe Seth Jolly Gary Marks Jonathan Polk Jan Rovny Marco Steenbergen and Milada Anna Vachudova. 2020. 1999\u20132019 Chapel Hill Expert Survey Trend File. Version 1.2.Retrievedfrom:chesdata.eu."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1080\/19312458.2019.1594741"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.electstud.2006.04.002"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.electstud.2006.03.010"},{"key":"e_1_3_2_8_2","unstructured":"Matthew J. Denny and Arthur Spirling. 2016. Assessing the Consequences of Text Preprocessing Decisions. Retrieved from https:\/\/ssrn.com\/abstract=2849145."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1017\/pan.2017.44"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.4135\/9781526486387.n30"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.4135\/9781526486387.n30"},{"key":"e_1_3_2_12_2","volume-title":"Studies in Linguistic Analysis","author":"Firth J.","year":"1957","unstructured":"J. Firth. 1957. A synopsis of linguistic theory 1930\u20131955. In Studies in Linguistic Analysis. Philological Society, Oxford."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1277"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1111\/1467-9248.12015"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2017.11.041"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2109"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1185"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2029"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.31219\/osf.io\/ghxj8"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mps028"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/FADS.2017.8253197"},{"key":"e_1_3_2_23_2","volume-title":"Proceedings of the EPSA Annual Conference","author":"Hawkins Kirk A.","year":"2019","unstructured":"Kirk A. Hawkins, Rosario Aguilar, Bruno Castanho Silva, Erik K. Jenne, Bojana Kocijan, and Crist\u00f3bal Rovira Kaltwasser. 2019. Measuring populist discourse: The global populism database. In Proceedings of the EPSA Annual Conference."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1177\/2053168015580476"},{"key":"e_1_3_2_25_2","volume-title":"Proceedings of the 6th Workshop on Automated Knowledge Base Construction (AKBC)","author":"Joulin Armand","year":"2017","unstructured":"Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, and Tomas Mikolov. 2017. Fast linear model for knowledge graph embeddings. In Proceedings of the 6th Workshop on Automated Knowledge Base Construction (AKBC)."},{"key":"e_1_3_2_26_2","first-page":"169","article-title":"Coder training: Key to enhancing reliability and validity","volume":"3","author":"Lacewell Onawa P.","year":"2013","unstructured":"Onawa P. Lacewell and Annika Werner. 2013. Coder training: Key to enhancing reliability and validity. Map. Polic. Pref. Texts 3 (2013), 169\u2013194.","journal-title":"Map. Polic. Pref. Texts"},{"key":"e_1_3_2_27_2","first-page":"282","volume-title":"Proceedings of the 18th International Conference on Machine Learning","author":"Lafferty John D.","year":"2001","unstructured":"John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning. 282\u2013289."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1030"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0003055403000698"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0003055403000698"},{"key":"e_1_3_2_31_2","first-page":"1188","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Le Quoc","year":"2014","unstructured":"Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188\u20131196."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mpn004"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mpt002"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1939-9162.2010.00006.x"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1101"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-5010"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/311445"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/10.2.134"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mpm010"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1177\/2053168016643346"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1093\/pan\/mpr047"},{"key":"e_1_3_2_42_2","first-page":"3111","volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 3111\u20133119."},{"key":"e_1_3_2_43_2","first-page":"59","volume-title":"Proceedings of the LREC Workshop ParlaCLARIN","author":"Nanni Federico","year":"2018","unstructured":"Federico Nanni, Goran Glava\u0161, Simone Paolo Ponzetto, Sara Tonelli, Nicol\u00f2 Conti, Ahmet Aker, Alessio Palmero Aprosio, Arnim Bleier, Benedetta Carlotti, Theresa Gessler, Tim Henrichsen, Dirk Hovy, Christian Kahmann, Mladen Karan, Akitaka Matsuo, Stefano Menini, Dong Nguyen, Andreas Niekler, Lisa Posch, Federico Vegetti, Zeerak Waseem, Tanya Whyte, and Nikoleta Yordanova. 2018. Findings from the hackathon on understanding Euroscepticism through the lens of textual data. In Proceedings of the LREC Workshop ParlaCLARIN. 59\u201366."},{"key":"e_1_3_2_44_2","unstructured":"Federico Nanni Goran Glava\u0161 Simone Paolo Ponzetto and Heiner Stuckenschmidt. 2019. Online Appendix: Political Text Scaling Meets Computational Semantics. Retrieved from https:\/\/federiconanni.com\/semantic-scaling\/."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2483592"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1177\/1354068820927686"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_48_2","unstructured":"Patrick O. Perry and Kenneth Benoit. 2017. Scaling Text with the Class Affinity Model. Retrieved from https:\/\/arxiv.org\/abs\/1710.08963."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1026"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1177\/2053168016686915"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.2307\/2111172"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0007123409990299"},{"issue":"28","key":"e_1_3_2_54_2","first-page":"112","article-title":"Word embeddings for the analysis of ideological placement in parliamentary corpora","volume":"1","author":"Rheault Ludovic","year":"2019","unstructured":"Ludovic Rheault and Christopher Cochrane. 2019. Word embeddings for the analysis of ideological placement in parliamentary corpora. Polit. Anal. 1, 28 (2019), 112\u2013133.","journal-title":"Polit. Anal."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","unstructured":"Pedro Rodriguez and Arthur Spirling. Forthcoming. Word embeddings: What works what doesn\u2019t and how to tell the difference for applied research. J. Polit. DOI:https:\/\/doi.org\/10.1086\/715162","DOI":"10.1086\/715162"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1540-5907.2008.00338.x"},{"key":"e_1_3_2_58_2","first-page":"1117","volume-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers","author":"Wachsmuth Henning","year":"2017","unstructured":"Henning Wachsmuth, Benno Stein, and Yamen Ajjour. 2017. \u201cPageRank\u201d for argument relevance. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Association for Computational Linguistics, 1117\u20131127. Retrieved from https:\/\/www.aclweb.org\/anthology\/E17-1105."},{"key":"e_1_3_2_59_2","first-page":"412","volume-title":"Proceedings of the 14th International Conference on Machine Learning","author":"Yang Yiming","year":"1997","unstructured":"Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning. 412\u2013420."},{"key":"e_1_3_2_60_2","first-page":"912","volume-title":"Proceedings of the 20th International Conference on Machine Learning","author":"Zhu Xiaojin","year":"2003","unstructured":"Xiaojin Zhu, Zoubin Ghahramani, and John D. Lafferty. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning. 912\u2013919."}],"container-title":["ACM\/IMS Transactions on Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485666","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3485666","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T13:55:48Z","timestamp":1776347748000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485666"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,30]]},"references-count":59,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,11,30]]}},"alternative-id":["10.1145\/3485666"],"URL":"https:\/\/doi.org\/10.1145\/3485666","relation":{},"ISSN":["2691-1922"],"issn-type":[{"value":"2691-1922","type":"print"}],"subject":[],"published":{"date-parts":[[2021,11,30]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}