{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T02:37:38Z","timestamp":1769049458352,"version":"3.49.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2010,11,1]],"date-time":"2010-11-01T00:00:00Z","timestamp":1288569600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2010,11]]},"abstract":"<jats:p>This article is set in the context of searching text and image repositories by keyword. We develop a unified probabilistic framework for text, image, and combined text and image retrieval that is based on the detection of keywords (concepts) using automated image annotation technology. Our framework is deeply rooted in information theory and lends itself to use with other media types.<\/jats:p>\n          <jats:p>We estimate a statistical model in a multimodal feature space for each possible query keyword. The key element of our framework is to identify feature space transformations that make them comparable in complexity and density. We select the optimal multimodal feature space with a minimum description length criterion from a set of candidate feature spaces that are computed with the average-mutual-information criterion for the text part and hierarchical expectation maximization for the visual part of the data. We evaluate our approach in three retrieval experiments (only text retrieval, only image retrieval, and text combined with image retrieval), verify the framework's low computational complexity, and compare with existing state-of-the-art ad-hoc models.<\/jats:p>","DOI":"10.1145\/1852102.1852105","type":"journal-article","created":{"date-parts":[[2010,11,23]],"date-time":"2010-11-23T15:00:38Z","timestamp":1290524438000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["An information-theoretic framework for semantic-multimedia retrieval"],"prefix":"10.1145","volume":"28","author":[{"given":"Jo\u00e3o","family":"Magalh\u00e3es","sequence":"first","affiliation":[{"name":"Universidade Nova de Lisboa, Lisbon, Portugal"}]},{"given":"Stefan","family":"R\u00fcger","sequence":"additional","affiliation":[{"name":"Knowledge Media Institute, The Open University, Milton Keynes, UK"}]}],"member":"320","published-online":{"date-parts":[[2010,11,23]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the TREC Video Retrieval Evaluation Workshop.","author":"Amir A.","unstructured":"Amir , A. , Argillander , J. O. , Campbell , M. , Haubold , A. , Iyengar , G. , Ebabdollahi , S. , Kang , F. , Naphade , M. , Natsev , A. , Smith , J. R. , Tesic , J. , and Volkmer , T . 2005. IBM Research TRECVID-2005 video retrieval system . In Proceedings of the TREC Video Retrieval Evaluation Workshop. Amir, A., Argillander, J. O., Campbell, M., Haubold, A., Iyengar, G., Ebabdollahi, S., Kang, F., Naphade, M., Natsev, A., Smith, J. R., Tesic, J., and Volkmer, T. 2005. IBM Research TRECVID-2005 video retrieval system. In Proceedings of the TREC Video Retrieval Evaluation Workshop."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Proceessing.","author":"Argillander J.","unstructured":"Argillander , J. , Iyengar , G. , and Nock , H . 2005. Semantic annotation of multimedia using maximum entropy models . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Proceessing. Argillander, J., Iyengar, G., and Nock, H. 2005. Semantic annotation of multimedia using maximum entropy models. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Proceessing."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the International Conference on Computer Vision.","author":"Barnard K.","unstructured":"Barnard , K. , and Forsyth , D. A . 2001. Learning the semantics of words and pictures . In Proceedings of the International Conference on Computer Vision. Barnard, K., and Forsyth, D. A. 2001. Learning the semantics of words and pictures. In Proceedings of the International Conference on Computer Vision."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.86996"},{"key":"e_1_2_1_5_1","first-page":"39","article-title":"A maximum entropy approach to natural language processing","volume":"37","author":"Berger A.","year":"1996","unstructured":"Berger , A. , Pietra , S. , and Pietra , V. 1996 . A maximum entropy approach to natural language processing . Computational Ling. 37 , 39 -- 71 . Berger, A., Pietra, S., and Pietra, V. 1996. A maximum entropy approach to natural language processing. Computational Ling. 37, 39--71.","journal-title":"Computational Ling."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860460"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.164"},{"key":"e_1_2_1_8_1","volume-title":"-Q","author":"Chang S.-F.","year":"2005","unstructured":"Chang , S.-F. , Hsu , W. , Kennedy , L. , Xie , L. , Yanagawa , A. , Zavesky , E. , and Zhang , D . -Q . 2005 . Columbia University TRECVID- 2005 video search and high-level feature extraction. In Proceedings of TRECVID. Chang, S.-F., Hsu, W., Kennedy, L., Xie, L., Yanagawa, A., Zavesky, E., and Zhang, D.-Q. 2005. Columbia University TRECVID-2005 video search and high-level feature extraction. In Proceedings of TRECVID."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Chen S. F. and Rosenfeld R. 1999. A Gaussian prior for smoothing maximum entropy models. Tech. rep. Carnegie Mellon University Pittsburg. PA.  Chen S. F. and Rosenfeld R. 1999. A Gaussian prior for smoothing maximum entropy models. Tech. rep. Carnegie Mellon University Pittsburg. PA.","DOI":"10.21236\/ADA360974"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Cover T. M. and Thomas J. A. 1991. Elements of Information Theory. John Wiley &amp; Sons.   Cover T. M. and Thomas J. A. 1991. Elements of Information Theory. John Wiley &amp; Sons.","DOI":"10.1002\/0471200611"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Duygulu P.","unstructured":"Duygulu , P. , Barnard , K. , de Freita , N. , and Forsyth , D . 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary . In Proceedings of the European Conference on Computer Vision. Duygulu, P., Barnard, K., de Freita, N., and Forsyth, D. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the European Conference on Computer Vision."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Feng S. L.","unstructured":"Feng , S. L. , Lavrenko , V. , and Manmatha , R . 2004. Multiple Bernoulli relevance models for image and video annotation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Feng, S. L., Lavrenko, V., and Manmatha, R. 2004. Multiple Bernoulli relevance models for image and video annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.990138"},{"key":"e_1_2_1_14_1","unstructured":"Forman G. 2003. An extensive empirical study of feature selection metrics for text classification. Mach. Learn. Res. 1289--1305.   Forman G. 2003. An extensive empirical study of feature selection metrics for text classification. Mach. Learn. Res. 1289--1305."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Hastie T. Tibshirani R. and \n      Friedman J\n  . \n  2001\n  . The Elements of Statistical Learning: \n  Data Mining Inference and Prediction Springer Series in Statistics\n  : \n  Springer\n  .  Hastie T. Tibshirani R. and Friedman J. 2001. The Elements of Statistical Learning: Data Mining Inference and Prediction Springer Series in Statistics: Springer.","DOI":"10.1007\/978-0-387-21606-5"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/312624.312649"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860459"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Conference on Image and Video Retrieval.","author":"Jeon J.","unstructured":"Jeon , J. , and Manmatha , R . 2004. Using maximum entropy for automatic image annotation . In Proceedings of the International Conference on Image and Video Retrieval. Jeon, J., and Manmatha, R. 2004. Using maximum entropy for automatic image annotation. In Proceedings of the International Conference on Image and Video Retrieval."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Joachims T.","year":"1997","unstructured":"Joachims , T. 1997 . A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization . In Proceedings of the International Conference on Machine Learning. Joachims, T. 1997. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/645326.649721"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence.","author":"Kohavi R.","year":"1995","unstructured":"Kohavi , R. 1995 . A study of cross-validation and bootstrap for accuracy estimation and model selection . In Proceedings of the International Joint Conference on Artificial Intelligence. Kohavi, R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the Neural Information Processing System Conference.","author":"Lavrenko V.","unstructured":"Lavrenko , V. , Manmatha , R. , and Jeon , J . 2003. A model for learning the semantics of pictures . In Proceedings of the Neural Information Processing System Conference. Lavrenko, V., Manmatha, R., and Jeon, J. 2003. A model for learning the semantics of pictures. In Proceedings of the Neural Information Processing System Conference."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2005.10"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01589116"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01589116"},{"key":"e_1_2_1_26_1","volume-title":"Information Theory, Inference, and Learning Algorithms","author":"MacKay D. J C.","unstructured":"MacKay , D. J C. 2004. Information Theory, Inference, and Learning Algorithms . Cambridge University Press . MacKay, D. J C. 2004. Information Theory, Inference, and Learning Algorithms. Cambridge University Press."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1282280.1282368"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118853.1118871"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the AAAI Workshop on Learning for Text Categorization.","author":"McCallum A.","unstructured":"McCallum , A. , and Nigam , K . 1998. A comparison of event models for naive Bayes text classification . In Proceedings of the AAAI Workshop on Learning for Text Categorization. McCallum, A., and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI Workshop on Learning for Text Categorization."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"McCullagh P. and Nelder J. A. 1989. Generalized Linear Models. 2nd Ed: Chapman and Hall.  McCullagh P. and Nelder J. A. 1989. Generalized Linear Models. 2nd Ed: Chapman and Hall.","DOI":"10.1007\/978-1-4899-3242-6"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/6046.909601"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the IJCAI - Workshop on Machine Learning for Information Filtering.","author":"Nigam K.","year":"1999","unstructured":"Nigam , K. , Lafferty , J. , and McCallum , A. 1999 . Using maximum entropy for text classification . In Proceedings of the IJCAI - Workshop on Machine Learning for Information Filtering. Nigam, K., Lafferty, J., and McCallum, A. 1999. Using maximum entropy for text classification. In Proceedings of the IJCAI - Workshop on Machine Learning for Information Filtering."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Nocedal J. and Wright S. J. 1999. Numerical Optimization. Springer-Verlag.  Nocedal J. and Wright S. J. 1999. Numerical Optimization. Springer-Verlag.","DOI":"10.1007\/b98874"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/0005-1098(78)90005-5"},{"key":"e_1_2_1_37_1","volume-title":"G. Salton (Ed) The SMART Retrieval System: Experiments in Automatic Text Retrieval","author":"Rocchio J.","unstructured":"Rocchio , J. 1971. Relevance feedback in information retrieval . In G. Salton (Ed) The SMART Retrieval System: Experiments in Automatic Text Retrieval , Prentice-Hall . Rocchio, J. 1971. Relevance feedback in information retrieval. In G. Salton (Ed) The SMART Retrieval System: Experiments in Automatic Text Retrieval, Prentice-Hall."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073473"},{"key":"e_1_2_1_39_1","volume-title":"The MediaMill TRECVID 2006 semantic video search engine. In Proceedings of the TREC Video Retrieval Evaluation Workshop.","author":"Snoek C. G. M., v.","unstructured":"Snoek , C. G. M., v. Gemert , J. C. , Gevers , T. , Huurnink , B. , Koelma , D. C., v. Liemp , M., d. Rooij , O., d. Sande , Seinstra, F. J., Smeulder , A. W. M. , Thean , A. H. C. , Veenman , C. J. , and Worring , M . 2006 . The MediaMill TRECVID 2006 semantic video search engine. In Proceedings of the TREC Video Retrieval Evaluation Workshop. Snoek, C. G. M., v. Gemert, J. C., Gevers, T., Huurnink, B., Koelma, D. C., v. Liemp, M., d. Rooij, O., d. Sande, Seinstra, F. J., Smeulder, A. W. M., Thean, A. H. C., Veenman, C. J., and Worring, M. 2006. The MediaMill TRECVID 2006 semantic video search engine. In Proceedings of the TREC Video Retrieval Evaluation Workshop."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.212"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/83.892448"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860461"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the TREC Video Retrieval Evaluation Workshop.","author":"Westerveld T.","unstructured":"Westerveld , T. , de Vrie , A. P. , Ianeva , T. , Boldareva , L. , and Hiemstra , D . 2003. Combining information sources for video retrieval . In Proceedings of the TREC Video Retrieval Evaluation Workshop. Westerveld, T., de Vrie, A. P., Ianeva, T., Boldareva, L., and Hiemstra, D. 2003. Combining information sources for video retrieval. In Proceedings of the TREC Video Retrieval Evaluation Workshop."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009982220290"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/183422.183424"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/312624.312647"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the Conference on Machine Learning.","author":"Yang Y.","unstructured":"Yang , Y. , and Pedersen , J. O . 1997. A comparative study on feature selection in text categorization . In Proceedings of the Conference on Machine Learning. Yang, Y., and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the Conference on Machine Learning."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/11526346_54"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011441423217"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1852102.1852105","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1852102.1852105","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:08:17Z","timestamp":1750248497000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1852102.1852105"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,11]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,11]]}},"alternative-id":["10.1145\/1852102.1852105"],"URL":"https:\/\/doi.org\/10.1145\/1852102.1852105","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,11]]},"assertion":[{"value":"2008-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-11-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}