{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T05:29:19Z","timestamp":1772342959963,"version":"3.50.1"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,12,21]],"date-time":"2022-12-21T00:00:00Z","timestamp":1671580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"crossref","award":["823914, 871042 and 951911"],"award-info":[{"award-number":["823914, 871042 and 951911"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2023,4,30]]},"abstract":"<jats:p><jats:italic>Funnelling<\/jats:italic>(<jats:sc>Fun<\/jats:sc>) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives<jats:sc>Fun<\/jats:sc>an edge over CLTC systems in which these correlations cannot be brought to bear. In this article, we describe<jats:italic>Generalized Funnelling<\/jats:italic>(<jats:sc>gFun<\/jats:sc>), a generalization of<jats:sc>Fun<\/jats:sc>consisting of an HTL architecture in which 1st-tier components can be arbitrary<jats:italic>view-generating functions<\/jats:italic>, i.e., language-dependent functions that each produce a language-independent representation (\u201cview\u201d) of the (monolingual) document. We describe an instance of<jats:sc>gFun<\/jats:sc>in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in<jats:sc>Fun<\/jats:sc>) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by<jats:italic>Word-Class Embeddings<\/jats:italic>), word-word correlations (as encoded by<jats:italic>Multilingual Unsupervised or Supervised Embeddings<\/jats:italic>), and word-context correlations (as encoded by<jats:italic>multilingual BERT<\/jats:italic>). We show that this instance of<jats:sc>gFun<\/jats:sc>substantially improves over<jats:sc>Fun<\/jats:sc>and over state-of-the-art baselines by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements<jats:sc>gFun<\/jats:sc>is publicly available.<\/jats:p>","DOI":"10.1145\/3544104","type":"journal-article","created":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T12:24:09Z","timestamp":1655123049000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification"],"prefix":"10.1145","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0377-1025","authenticated-orcid":false,"given":"Alejandro","family":"Moreo","sequence":"first","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2322-7043","authenticated-orcid":false,"given":"Andrea","family":"Pedrotti","sequence":"additional","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4221-6427","authenticated-orcid":false,"given":"Fabrizio","family":"Sebastiani","sequence":"additional","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,12,21]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1194905"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.3115\/981574.981595"},{"key":"e_1_3_2_4_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Arora Sanjeev","year":"2017","unstructured":"Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1250"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1042"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45175-4_13"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944966"},{"key":"e_1_3_2_9_2","first-page":"155","volume-title":"Proceedings of the 16th International Conference of the Pacific Association for Computational Linguistics (PACLING\u201919)","author":"Chen Guan-Yuan","year":"2019","unstructured":"Guan-Yuan Chen and Von-Wun Soo. 2019. Deep domain adaptation for low-resource cross-lingual text classification tasks. In Proceedings of the 16th International Conference of the Pacific Association for Computational Linguistics (PACLING\u201919). 155\u2013168."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1299"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_12_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Conneau Alexis","year":"2018","unstructured":"Alexis Conneau, Guillaume Lample, Marc\u2019Aurelio Ranzato, Ludovic Denoyer, and Herv\u00e9 J\u00e9gou. 2018. Word translation without parallel data. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)."},{"key":"e_1_3_2_13_2","volume-title":"Proceedings of the 40th Annual Meeting of the Cognitive Science Society (CogSci\u201918)","author":"Dasgupta Ishita","year":"2018","unstructured":"Ishita Dasgupta, Demi Guo, Andreas Stuhlm\u00fcller, Samuel Gershman, and Noah D. Goodman. 2018. Evaluating compositionality in sentence embeddings. In Proceedings of the 40th Annual Meeting of the Cognitive Science Society (CogSci\u201918)."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-017-0089-0"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2016.06.012"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_3_2_17_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Multilingual BERT Readme Document. Retrieved from https:\/\/github.com\/google-research\/bert\/blob\/a9ba4b8d7704c1ae18d1b28c56c0430d41407eb1\/multilingual.md."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-5661-9_5"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1572"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3326065"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/e14-1049"},{"issue":"3","key":"e_1_3_2_23_2","first-page":"43","article-title":"Multilingual approaches to text categorisation","volume":"5","author":"Adeva Juan Jos\u00e9 Garc\u00eda","year":"2005","unstructured":"Juan Jos\u00e9 Garc\u00eda Adeva, Rafael A. Calvo, and Diego L\u00f3pez de Ipi\u0144a. 2005. Multilingual approaches to text categorisation. Eur. J. Inform. Profess. 5, 3 (2005), 43\u201351.","journal-title":"Eur. J. Inform. Profess."},{"key":"e_1_3_2_24_2","first-page":"723\u2013\u2013773","article-title":"A kernel two-sample test","volume":"13","author":"Gretton Arthur","year":"2012","unstructured":"Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch\u00f6lkopf, and Alexander Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13 (2012), 723\u2013\u2013773.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.542"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-009-8467-7_1"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_28_2","first-page":"448","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915)","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915). 448\u2013456."},{"key":"e_1_3_2_29_2","first-page":"385","volume-title":"Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa\u201921)","author":"Isbister Tim","year":"2021","unstructured":"Tim Isbister, Fredrik Carlsson, and Magnus Sahlgren. 2021. Should we stop training more monolingual models, and simply use machine translation instead?. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa\u201921). 385\u2013390."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.560"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-50011-1_34"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.323"},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)."},{"key":"e_1_3_2_34_2","first-page":"7057","volume-title":"Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS\u201919)","author":"Lample Guillaume","year":"2019","unstructured":"Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS\u201919). 7057\u20137067."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.104.2.211"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00134"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)."},{"key":"e_1_3_2_38_2","first-page":"6294","volume-title":"Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917)","author":"McCann Bryan","year":"2017","unstructured":"Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917). 6294\u20136305."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K16-1006"},{"key":"e_1_3_2_40_2","unstructured":"Tomas Mikolov Quoc V. Le and Ilya Sutskever. 2013. Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168."},{"key":"e_1_3_2_41_2","first-page":"746","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL\u201913)","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Wen-Tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL\u201913). 746\u2013751."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.3115\/1699571.1699627"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.4762"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.5194"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-020-00735-3"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3412841.3442093"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00259"},{"key":"e_1_3_2_48_2","volume-title":"Heterogeneous Document Embeddings for Multi-lingual Text Classification","author":"Pedrotti Andrea","year":"2020","unstructured":"Andrea Pedrotti. 2020. Heterogeneous Document Embeddings for Multi-lingual Text Classification. Master\u2019s Thesis. University of Pisa, Pisa, IT."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"61","DOI":"10.7551\/mitpress\/1113.003.0008","volume-title":"Advances in Large Margin Classifiers","author":"Platt John C.","year":"2000","unstructured":"John C. Platt. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers, Alexander Smola, Peter Bartlett, Bernard Sch\u00f6lkopf, and Dale Schuurmans (Eds.). The MIT Press, Cambridge, MA, 61\u201374."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11640"},{"key":"e_1_3_2_53_2","volume-title":"The Word-space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations between Words in High-dimensional Vector Spaces","author":"Sahlgren Magnus","year":"2006","unstructured":"Magnus Sahlgren. 2006. The Word-space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations between Words in High-dimensional Vector Spaces. Ph.D. Dissertation. Swedish Institute for Computer Science, University of Stockholm, Stockholm, SE."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(00)00094-7"},{"key":"e_1_3_2_55_2","first-page":"895","volume-title":"Proceedings of the 6th Conference on Neural Information Processing Systems (NIPS\u201993)","author":"Sch\u00fctze Hinrich","year":"1993","unstructured":"Hinrich Sch\u00fctze. 1993. Word space. In Proceedings of the 6th Conference on Neural Information Processing Systems (NIPS\u201993). 895\u2013902."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/2808194.2809449"},{"key":"e_1_3_2_57_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Smith Samuel L.","year":"2017","unstructured":"Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)."},{"key":"e_1_3_2_58_2","first-page":"4077","volume-title":"Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Snell Jake","year":"2017","unstructured":"Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917). 4077\u20134087."},{"key":"e_1_3_2_59_2","first-page":"129","volume-title":"Proceedings of the 28th International Conference on Machine Learning (ICML\u201911)","author":"Socher Richard","year":"2011","unstructured":"Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML\u201911). 129\u2013136."},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2012.02.003"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.168"},{"key":"e_1_3_2_62_2","first-page":"5998","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201917). 5998\u20136008."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-30164-8_401"},{"key":"e_1_3_2_64_2","first-page":"1473","volume-title":"Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS\u201902)","author":"Vinokourov Alexei","year":"2002","unstructured":"Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. 2002. Inferring a semantic representation of text via cross-language correlation analysis. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS\u201902). 1473\u20131480."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1216"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-short.78"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1104"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.315"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3291124"},{"key":"e_1_3_2_71_2","first-page":"7404","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML\u201919)","author":"Zhang Yuchen","year":"2019","unstructured":"Yuchen Zhang, Tianle Liu, Mingsheng Long, and Michael Jordan. 2019. Bridging theory and algorithm for domain adaptation. In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919). 7404\u20137413."}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544104","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3544104","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:20Z","timestamp":1750178780000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,21]]},"references-count":70,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4,30]]}},"alternative-id":["10.1145\/3544104"],"URL":"https:\/\/doi.org\/10.1145\/3544104","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,21]]},"assertion":[{"value":"2021-07-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-08","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}