{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T01:52:03Z","timestamp":1769046723168,"version":"3.49.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2015,2,17]],"date-time":"2015-02-17T00:00:00Z","timestamp":1424131200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2015,2,26]]},"abstract":"<jats:p>Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image\/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.<\/jats:p>","DOI":"10.1145\/2699668","type":"journal-article","created":{"date-parts":[[2015,2,18]],"date-time":"2015-02-18T13:24:05Z","timestamp":1424265845000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods"],"prefix":"10.1145","volume":"33","author":[{"given":"Julien","family":"Ah-Pine","sequence":"first","affiliation":[{"name":"University of Lyon, France"}]},{"given":"Gabriela","family":"Csurka","sequence":"additional","affiliation":[{"name":"Xerox Research Centre Europe, Meylan, France"}]},{"given":"St\u00e9phane","family":"Clinchant","sequence":"additional","affiliation":[{"name":"Xerox Research Centre Europe, Meylan, France"}]}],"member":"320","published-online":{"date-parts":[[2015,2,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-008-0246-8"},{"key":"e_1_2_1_2_1","volume-title":"Working Notes of CLEF","author":"Ah-Pine J.","year":"2008"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"J. Ah-Pine S. Clinchant and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer.   J. Ah-Pine S. Clinchant and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer.","DOI":"10.1007\/978-3-642-15751-6_13"},{"key":"e_1_2_1_4_1","volume-title":"Working Notes of the 2009 CLEF Workshop.","author":"Ah-Pine J."},{"key":"e_1_2_1_5_1","volume-title":"ImageCLEF- Experimental Evaluation in Visual Information Retrieval.","author":"Ah-Pine J."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/312624.312681"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.70801"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1743384.1743442"},{"key":"e_1_2_1_11_1","unstructured":"S. Clinchant G. Csurka J. Ah-Pine G. Jacquet F. Perronnin J. S\u00e1nchez and K. Minoukadeh. 2010. XRCE\u2019s participation in Wikipedia retrieval medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers\/LABs\/Workshops).  S. Clinchant G. Csurka J. Ah-Pine G. Jacquet F. Perronnin J. S\u00e1nchez and K. Minoukadeh. 2010. XRCE\u2019s participation in Wikipedia retrieval medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers\/LABs\/Workshops)."},{"key":"e_1_2_1_12_1","volume-title":"ImageEval Workshop at CVIR.","author":"Clinchant S.","year":"2007"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835490"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/11735106_20"},{"key":"e_1_2_1_15_1","unstructured":"S. Clinchant J. M. Renders and G. Csurka. 2007. XRCE\u2019s participation to ImageCLEF. In CLEF Working Notes.  S. Clinchant J. M. Renders and G. Csurka. 2007. XRCE\u2019s participation to ImageCLEF. In CLEF Working Notes."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85760-0_71"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277784"},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.  G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.","DOI":"10.1109\/CBMI.2012.6269843"},{"key":"e_1_2_1_19_1","unstructured":"G. Csurka S. Clinchant and A. Popescu. 2011. XRCE\u2019s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers\/Labs\/Workshop).  G. Csurka S. Clinchant and A. Popescu. 2011. XRCE\u2019s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers\/Labs\/Workshop)."},{"key":"e_1_2_1_20_1","volume-title":"ECCV Workshop on Statistical Learning for Computer Vision.","author":"Csurka G."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1460096.1460125"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1953122.1953146"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2202676"},{"key":"e_1_2_1_24_1","volume-title":"International Conference on Language Resources and Evaluation","author":"Grubinger M."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1180639.1180654"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2007.61"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291446"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0271(71)90051-9"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860459"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835505"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324140"},{"key":"e_1_2_1_32_1","volume-title":"IEEE Conference on Computer Vision &amp; Pattern Recognition (CVPR&rsquo;\u201910)","author":"Krapac J.","year":"2010"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0036144503424786"},{"key":"e_1_2_1_34_1","unstructured":"V. Lavrenko R. Manmatha and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS.  V. Lavrenko R. Manmatha and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.68"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000016"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2010.2051360"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1852102.1852105"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"N. Maillot J.-P. Chevallet and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF C. Peters P. Clough F. C. Gey J. Karlgren B. Magnini D. W. Oard M. de Rijke and M. Stempfhuber (Eds.). Lecture Notes in Computer Science Vol. 4730. Springer 735--738.   N. Maillot J.-P. Chevallet and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF C. Peters P. Clough F. C. Gey J. Karlgren B. Magnini D. W. Oard M. de Rijke and M. Stempfhuber (Eds.). Lecture Notes in Computer Science Vol. 4730. Springer 735--738.","DOI":"10.1007\/978-3-540-74999-8_92"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99)","author":"Mori Y."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2072298.2072367"},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"H. M\u00fcller P. Clough Th. Deselaers and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer.   H. M\u00fcller P. Clough Th. Deselaers and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer.","DOI":"10.1007\/978-3-642-15181-1"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291448"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014135"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.  F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.","DOI":"10.1109\/CVPR.2007.383266"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"F. Perronnin J. S\u00e1nchez and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV.   F. Perronnin J. S\u00e1nchez and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV.","DOI":"10.1007\/978-3-642-15561-1_11"},{"key":"e_1_2_1_47_1","volume-title":"Working Notes of the 11th Workshop of the Cross-Language Evaluation Forum. CLEF-campaign. http:\/\/clef2010","author":"Popescu A.","year":"2010"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484144"},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","unstructured":"S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool.   S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool.","DOI":"10.1007\/978-3-031-02269-2"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888903000638"},{"key":"e_1_2_1_52_1","volume-title":"Video Google: A text retrieval approach to object matching in videos. In ICCV.","author":"Sivic J. S.","year":"2003"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1178677.1178722"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1101149.1101236"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1459359.1459378"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003)","author":"Vinokourov A."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10. 1109\/TCSVT.2009.2017400"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2009.2012919"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2207397"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1027527.1027746"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835449.1835556"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873977"},{"key":"e_1_2_1_63_1","doi-asserted-by":"crossref","unstructured":"Z.-J. Zha M. Wang J. Shen and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.  Z.-J. Zha M. Wang J. Shen and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.","DOI":"10.1007\/978-1-4614-3223-4_11"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2699668","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2699668","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:59Z","timestamp":1750227419000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2699668"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,2,17]]},"references-count":63,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,2,26]]}},"alternative-id":["10.1145\/2699668"],"URL":"https:\/\/doi.org\/10.1145\/2699668","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,2,17]]},"assertion":[{"value":"2013-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}