{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T10:16:26Z","timestamp":1742379386907},"reference-count":33,"publisher":"World Scientific Pub Co Pte Lt","issue":"03","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Image Grap."],"published-print":{"date-parts":[[2001,7]]},"abstract":"<jats:p> In this paper, we propose a new framework for the dynamic construction of structured visual object\/scene detectors for content-based retrieval. In the Visual Apprentice, a user defines visual object\/scene models via a multiple-level Definition Hierarchy: a scene consists of objects, which consist of object-parts, which consist of perceptual-areas, which consist of regions. The user trains the system by providing example images\/videos and labeling components according to the hierarchy she defines (e.g., image of two people shaking hands contains two faces and a handshake). As the user trains the system, visual features (e.g., color, texture, motion, etc.) are extracted from each example provided, for each node of the hierarchy (defined by the user). Various machine learning algorithms are then applied to the training data, at each node, to learn classifiers. The best classifiers and features are then automatically selected for each node (using cross-validation on the training data). The process yields a Visual Object\/Scene Detector (e.g., for a handshake), which consists of an hierarchy of classifiers as it was defined by the user. The Visual Detector classifies new images\/videos by first automatically segmenting them, and applying the classifiers according to the hierarchy: regions are classified first, followed by the classification of perceptual-areas, object-parts and objects. We discuss how the concept of Recurrent Visual Semantics can be used to identify domains in which learning techniques such as the one presented can be applied. We then present experimental results using several hierarchies for classifying images and video shots (e.g., Baseball video, images that contain handhakes, skies, etc.). These results, which show good performance, demonstrate the feasibility and usefulness of dynamic approaches for constructing structured visual object\/scene detectors from user input at multiple levels. <\/jats:p>","DOI":"10.1142\/s0219467801000256","type":"journal-article","created":{"date-parts":[[2003,5,7]],"date-time":"2003-05-07T04:18:55Z","timestamp":1052281135000},"page":"415-444","source":"Crossref","is-referenced-by-count":10,"title":["LEARNING STRUCTURED VISUAL DETECTORS FROM USER INPUT AT MULTIPLE LEVELS"],"prefix":"10.1142","volume":"01","author":[{"given":"ALEJANDRO","family":"JAIMES","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Columbia University, 500 West 120th Street MC 4712 New York, NY 10027, USA"}]},{"given":"SHIH-FU","family":"CHANG","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Columbia University, 500 West 120th Street MC 4712 New York, NY 10027, USA"}]}],"member":"219","published-online":{"date-parts":[[2011,11,20]]},"reference":[{"key":"p_1","first-page":"3964","volume":"2000","author":"Jaimes A.","year":"2000","journal-title":"Proceedings of SPIE Internet Imaging"},{"key":"p_4","doi-asserted-by":"publisher","DOI":"10.1109\/34.895972"},{"key":"p_5","doi-asserted-by":"publisher","DOI":"10.1006\/jvci.1999.0413"},{"key":"p_6","doi-asserted-by":"publisher","DOI":"10.1109\/69.755617"},{"key":"p_8","doi-asserted-by":"publisher","DOI":"10.1145\/265563.265573"},{"key":"p_12","doi-asserted-by":"publisher","DOI":"10.1109\/CAIVD.1998.646032"},{"key":"p_13","doi-asserted-by":"publisher","DOI":"10.1109\/83.892448"},{"key":"p_14","author":"Chang S.-F.","year":"2001","journal-title":"IEEE Trans. Circuits Syst Video Technology, Special Issue on MPEG-7"},{"key":"p_20","doi-asserted-by":"publisher","DOI":"10.1117\/12.333859"},{"key":"p_22","first-page":"3972","volume":"2000","author":"Jaimes A.","year":"2000","journal-title":"Proceedings of SPIE Storage and Retrieval for Media Databases"},{"key":"p_23","doi-asserted-by":"publisher","DOI":"10.1117\/12.143648"},{"key":"p_25","doi-asserted-by":"publisher","DOI":"10.1109\/76.718507"},{"key":"p_26","doi-asserted-by":"publisher","DOI":"10.1117\/12.234785"},{"issue":"1","key":"p_27","first-page":"85","volume":"3","author":"Bergman L. D.","year":"2000","journal-title":"Int. J. Digital Libraries, Special Issue \u201cIn the Tradition of the Alexandrian Scholars\u201d"},{"key":"p_29","doi-asserted-by":"publisher","DOI":"10.1117\/12.263446"},{"key":"p_31","doi-asserted-by":"publisher","DOI":"10.1109\/6046.909601"},{"key":"p_35","first-page":"158","author":"Rowley H. A.","year":"1995","journal-title":"Carnigie Mellon University, Technical Report CMU-CS-95"},{"key":"p_41","doi-asserted-by":"publisher","DOI":"10.1117\/12.360423"},{"key":"p_43","doi-asserted-by":"publisher","DOI":"10.1145\/234782.234805"},{"key":"p_46","doi-asserted-by":"publisher","DOI":"10.1109\/34.659936"},{"key":"p_47","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007919421801"},{"key":"p_49","first-page":"1492","volume":"2","author":"Zhong D.","year":"1997","journal-title":"Proceedings of IEEE International Conference on Circuits and Systems, Special Session on Networked Multimedia Technology & Applications"},{"key":"p_58","doi-asserted-by":"publisher","DOI":"10.1109\/34.574797"},{"key":"p_59","first-page":"192","author":"Kohavi R.","year":"1995","journal-title":"Proceedings of First International Conference on Knowledge Discovery and Data Mining"},{"key":"p_60","first-page":"1","volume":"1","author":"Salzberg S. L.","year":"1999","journal-title":"Data Mining and Knowledge Discovery"},{"key":"p_62","first-page":"94","author":"Kohavi R.","year":"1994","journal-title":"Proceedings of Conference on Tools with Artificial Intelligence"},{"key":"p_66","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1985.6313426"},{"key":"p_68","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1117\/12.234795","volume":"2670","author":"Meng J.","year":"1996","journal-title":"Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Database"},{"key":"p_69","doi-asserted-by":"publisher","DOI":"10.1109\/34.877520"},{"key":"p_72","doi-asserted-by":"publisher","DOI":"10.1117\/1.482613"},{"key":"p_77","doi-asserted-by":"publisher","DOI":"10.1109\/76.718510"},{"key":"p_79","first-page":"6","author":"Tamura H.","year":"1978","journal-title":"IEEE Trans. Syst. Man, and Cybernetics SMC-8"},{"key":"p_80","doi-asserted-by":"publisher","DOI":"10.1016\/S0734-189X(85)80004-9"}],"container-title":["International Journal of Image and Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219467801000256","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T21:01:13Z","timestamp":1565125273000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219467801000256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2001,7]]},"references-count":33,"journal-issue":{"issue":"03","published-online":{"date-parts":[[2011,11,20]]},"published-print":{"date-parts":[[2001,7]]}},"alternative-id":["10.1142\/S0219467801000256"],"URL":"https:\/\/doi.org\/10.1142\/s0219467801000256","relation":{},"ISSN":["0219-4678","1793-6756"],"issn-type":[{"value":"0219-4678","type":"print"},{"value":"1793-6756","type":"electronic"}],"subject":[],"published":{"date-parts":[[2001,7]]}}}