{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T10:05:29Z","timestamp":1773655529998,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,5,31]],"date-time":"2019-05-31T00:00:00Z","timestamp":1559260800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2019,7,31]]},"abstract":"<jats:p>\n            <jats:italic>Cross-lingual Text Classification<\/jats:italic>\n            (CLC) consists of automatically classifying, according to a common set\n            <jats:italic>C<\/jats:italic>\n            of classes, documents each written in one of a set of languages\n            <jats:italic>L<\/jats:italic>\n            , and doing so more accurately than when \u201cna\u00efvely\u201d classifying each document via its corresponding language-specific classifier. To obtain an increase in the classification accuracy for a given language, the system thus needs to also leverage the training examples written in the other languages. We tackle \u201cmultilabel\u201d CLC via\n            <jats:italic>funnelling<\/jats:italic>\n            , a new ensemble learning method that we propose here. Funnelling consists of generating a two-tier classification system where all documents, irrespective of language, are classified by the same (second-tier) classifier. For this classifier, all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by first-tier, language-dependent classifiers. This allows the classification of all test documents, of any language, to benefit from the information present in all training documents, of any language. We present substantial experiments, run on publicly available multilingual text collections, in which funnelling is shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vector form) are made publicly available.\n          <\/jats:p>","DOI":"10.1145\/3326065","type":"journal-article","created":{"date-parts":[[2019,6,3]],"date-time":"2019-06-03T12:23:16Z","timestamp":1559564596000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Funnelling"],"prefix":"10.1145","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5725-4322","authenticated-orcid":false,"given":"Andrea","family":"Esuli","sequence":"first","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0377-1025","authenticated-orcid":false,"given":"Alejandro","family":"Moreo","sequence":"additional","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4221-6427","authenticated-orcid":false,"given":"Fabrizio","family":"Sebastiani","sequence":"additional","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,5,31]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-30671-1_59"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45175-4_13"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3389\/fninf.2016.00049"},{"key":"e_1_2_1_4_1","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop Christopher M.","unstructured":"Christopher M. Bishop . 2006. Pattern Recognition and Machine Learning . Springer , Heidelberg . Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer, Heidelberg."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018054314350"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008640732416"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Conneau Alexis","year":"2018","unstructured":"Alexis Conneau , Guillaume Lample , Marc\u2019Aurelio Ranzato , Ludovic Denoyer , and Herv\u00e9 J\u00e9gou . 2018 . Word translation without parallel data . In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918) . Alexis Conneau, Guillaume Lample, Marc\u2019Aurelio Ranzato, Ludovic Denoyer, and Herv\u00e9 J\u00e9gou. 2018. Word translation without parallel data. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)."},{"key":"e_1_2_1_8_1","volume-title":"Khoshgoftaar","author":"Day Oscar","year":"2017","unstructured":"Oscar Day and Taghi M . Khoshgoftaar . 2017 . A survey on heterogeneous transfer learning. J. Big Data 4 (2017), Article 17 (1--42). Oscar Day and Taghi M. Khoshgoftaar. 2017. A survey on heterogeneous transfer learning. J. Big Data 4 (2017), Article 17 (1--42)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.2307\/2987588"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the AAAI Spring Symposium on Cross-language Text and Speech Retrieval. 18--24","author":"Dumais Susan T.","unstructured":"Susan T. Dumais , Todd A. Letsche , Michael L. Littman , and Thomas K. Landauer . 1997. Automatic cross-language retrieval using latent semantic indexing . In Proceedings of the AAAI Spring Symposium on Cross-language Text and Speech Retrieval. 18--24 . Susan T. Dumais, Todd A. Letsche, Michael L. Littman, and Thomas K. Landauer. 1997. Automatic cross-language retrieval using latent semantic indexing. In Proceedings of the AAAI Spring Symposium on Cross-language Text and Speech Retrieval. 18--24."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:MACH.0000015881.36452.6e"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-1049"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/E14-1044"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 13th International Conference on Machine Learning (ICML\u201996)","author":"Freund Yoav","unstructured":"Yoav Freund and Robert E. Schapire . 1996. Experiments with a new boosting algorithm . In Proceedings of the 13th International Conference on Machine Learning (ICML\u201996) . 148--156. Yoav Freund and Robert E. Schapire. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning (ICML\u201996). 148--156."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1625275.1625535"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/2946645.2946704"},{"key":"e_1_2_1_17_1","first-page":"43","article-title":"Multilingual approaches to text categorisation","volume":"5","author":"Garc\u00eda Adeva Juan Jos\u00e9","year":"2005","unstructured":"Juan Jos\u00e9 Garc\u00eda Adeva , Rafael A. Calvo , and Diego L\u00f3pez de Ipi\u0144a . 2005 . Multilingual approaches to text categorisation . Eur. J. Info. Prof. 5 , 3 (2005), 43 -- 51 . Juan Jos\u00e9 Garc\u00eda Adeva, Rafael A. Calvo, and Diego L\u00f3pez de Ipi\u0144a. 2005. Multilingual approaches to text categorisation. Eur. J. Info. Prof. 5, 3 (2005), 43--51.","journal-title":"Eur. J. Info. Prof."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24775-3_5"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915)","author":"Gouws Stephan","year":"2015","unstructured":"Stephan Gouws , Yoshua Bengio , and Greg Corrado . 2015 . Bilbowa: Fast bilingual distributed representations without word alignments . In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915) . 748--756. Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. Bilbowa: Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915). 748--756."},{"key":"e_1_2_1_20_1","unstructured":"Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.  Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321814"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/28.3-4.321"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 24th International Conference on Computational Linguistics (COLING\u201912)","author":"Klementiev Alexandre","year":"2012","unstructured":"Alexandre Klementiev , Ivan Titov , and Binod Bhattarai . 2012 . Inducing crosslingual distributed representations of words . In Proceedings of the 24th International Conference on Computational Linguistics (COLING\u201912) . 1459--1474. Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the 24th International Conference on Computational Linguistics (COLING\u201912). 1459--1474."},{"key":"e_1_2_1_25_1","volume-title":"Combining Pattern Classifiers: Methods and Algorithms","author":"Kuncheva Ludmila I.","unstructured":"Ludmila I. Kuncheva . 2004. Combining Pattern Classifiers: Methods and Algorithms . John Wiley 8 Sons, Hoboken, NJ. Ludmila I. Kuncheva. 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley 8 Sons, Hoboken, NJ."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1005345"},{"key":"e_1_2_1_27_1","unstructured":"Tomas Mikolov Quoc V. Le and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:1309.4168.  Tomas Mikolov Quoc V. Le and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv:1309.4168."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS\u201913)","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Gregory S. Corrado , and Jeffrey Dean . 2013 . Distributed representations of words and phrases and their compositionality . In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS\u201913) . 3111--3119. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS\u201913). 3111--3119."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1699571.1699627"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3013558.3013563"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3176748.3176752"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1631272.1631393"},{"key":"e_1_2_1_33_1","volume-title":"Transfer learning for text mining","author":"Pan Weike","unstructured":"Weike Pan , Erheng Zhong , and Qiang Yang . 2012. Transfer learning for text mining . In Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai (Eds.). Springer , Heidelberg , 223--258. Weike Pan, Erheng Zhong, and Qiang Yang. 2012. Transfer learning for text mining. In Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai (Eds.). Springer, Heidelberg, 223--258."},{"key":"e_1_2_1_34_1","volume-title":"Advances in Large Margin Classifiers, Alexander Smola, Peter Bartlett","author":"Platt John C.","unstructured":"John C. Platt . 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods . In Advances in Large Margin Classifiers, Alexander Smola, Peter Bartlett , Bernard Sch\u00f6lkopf , and Dale Schuurmans (Eds.). MIT Press , Cambridge, MA, 61--74. John C. Platt. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers, Alexander Smola, Peter Bartlett, Bernard Sch\u00f6lkopf, and Dale Schuurmans (Eds.). MIT Press, Cambridge, MA, 61--74."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/1858681.1858795"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/WI.2005.29"},{"key":"e_1_2_1_37_1","unstructured":"Sebastian Ruder Ivan Vuli\u0107 and Anders S\u00f8gaard. 2017. A survey of cross-lingual embedding models. arXiv:1706.04902v2.  Sebastian Ruder Ivan Vuli\u0107 and Anders S\u00f8gaard. 2017. A survey of cross-lingual embedding models. arXiv:1706.04902v2."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Workshop on Methods and Applications of Semantic Indexing.","author":"Sahlgren Magnus","year":"2005","unstructured":"Magnus Sahlgren . 2005 . An introduction to random indexing . In Proceedings of the Workshop on Methods and Applications of Semantic Indexing. Magnus Sahlgren. 2005. An introduction to random indexing. In Proceedings of the Workshop on Methods and Applications of Semantic Indexing."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220425"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing (EMNLP\u201901)","author":"Sakkis Georgios","year":"2001","unstructured":"Georgios Sakkis , Ion Androutsopoulos , Georgios Paliouras , Vangelis Karkaletsis , Constantine D. Spyropoulos , and Panagiotis Stamatopoulos . 2001 . Stacking classifiers for anti-spam filtering of e-mail . In Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing (EMNLP\u201901) . 44--50. Georgios Sakkis, Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. 2001. Stacking classifiers for anti-spam filtering of e-mail. In Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing (EMNLP\u201901). 44--50."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808194.2809449"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI\u201916)","author":"Song Yangqiu","year":"2016","unstructured":"Yangqiu Song , Shyam Upadhyay , Haoruo Peng , and Dan Roth . 2016 . Cross-lingual dataless classification for many languages . In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI\u201916) . 2901--2907. Yangqiu Song, Shyam Upadhyay, Haoruo Peng, and Dan Roth. 2016. Cross-lingual dataless classification for many languages. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI\u201916). 2901--2907."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 2008 Cross-Language Evaluation Forum (CLEF\u201908)","author":"Sorg Philipp","year":"2008","unstructured":"Philipp Sorg and Philipp Cimiano . 2008 . Cross-language information retrieval with explicit semantic analysis . In Proceedings of the 2008 Cross-Language Evaluation Forum (CLEF\u201908) . Philipp Sorg and Philipp Cimiano. 2008. Cross-language information retrieval with explicit semantic analysis. In Proceedings of the 2008 Cross-Language Evaluation Forum (CLEF\u201908)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2012.02.003"},{"key":"e_1_2_1_45_1","unstructured":"Ralf Steinberger Bruno Pouliquen Anna Widiger Camelia Ignat Tomaz Erjavec Dan Tufis and D\u00e1niel Varga. 2006. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. (2006). CoRR abs\/cs\/0609058.  Ralf Steinberger Bruno Pouliquen Anna Widiger Camelia Ignat Tomaz Erjavec Dan Tufis and D\u00e1niel Varga. 2006. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. (2006). CoRR abs\/cs\/0609058."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622859.1622868"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.4018\/jdwm.2007070101"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1157"},{"key":"e_1_2_1_49_1","volume-title":"Encyclopedia of Machine Learning, Claude Sammut and Geoffrey I","author":"Vilalta Ricardo","unstructured":"Ricardo Vilalta , Christophe Giraud-Carrier , Pavel Brazdil , and Carlos Soares . 2011. Inductive transfer . In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey I . Webb (Eds.). Springer , Heidelberg , 545--548. Ricardo Vilalta, Christophe Giraud-Carrier, Pavel Brazdil, and Carlos Soares. 2011. Inductive transfer. In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey I. Webb (Eds.). Springer, Heidelberg, 545--548."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS\u201902)","author":"Vinokourov Alexei","year":"2002","unstructured":"Alexei Vinokourov , John Shawe-Taylor , and Nello Cristianini . 2002 . Inferring a semantic representation of text via cross-language correlation analysis . In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS\u201902) . 1473--1480. Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. 2002. Inferring a semantic representation of text via cross-language correlation analysis. In Proceedings of the 16th Annual Conference on Neural Information Processing Systems (NIPS\u201902). 1473--1480."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/1687878.1687913"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(05)80023-1"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1016791"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3326065","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3326065","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:08Z","timestamp":1750204388000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3326065"}},"subtitle":["A New Ensemble Method for Heterogeneous Transfer Learning and Its Application to Cross-Lingual Text Classification"],"short-title":[],"issued":{"date-parts":[[2019,5,31]]},"references-count":53,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,7,31]]}},"alternative-id":["10.1145\/3326065"],"URL":"https:\/\/doi.org\/10.1145\/3326065","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,5,31]]},"assertion":[{"value":"2018-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-05-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}