{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,30]],"date-time":"2025-11-30T09:09:07Z","timestamp":1764493747198,"version":"3.41.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,8,1]],"date-time":"2013-08-01T00:00:00Z","timestamp":1375315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001381","name":"National Research Foundation-Prime Minister's office, Republic of Singapore","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001381","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Human Sixth Sense Programme at the Advanced Digital Sciences Center from Singapore's Agency for Science, Technology and Research"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2013,8]]},"abstract":"<jats:p>Decrypting the secret of beauty or attractiveness has been the pursuit of artists and philosophers for centuries. To date, the computational model for attractiveness estimation has been actively explored in computer vision and multimedia community, yet with the focus mainly on facial features. In this article, we conduct a comprehensive study on female attractiveness conveyed by single\/multiple modalities of cues, that is, face, dressing and\/or voice, and aim to discover how different modalities individually and collectively affect the human sense of beauty. To extensively investigate the problem, we collect the Multi-Modality Beauty (M<jats:sup>2<\/jats:sup>B) dataset, which is annotated with attractiveness levels converted from manual<jats:italic>k<\/jats:italic>-wise ratings and semantic attributes of different modalities. Inspired by the common consensus that middle-level attribute prediction can assist higher-level computer vision tasks, we manually labeled many attributes for each modality. Next, a tri-layer Dual-supervised Feature-Attribute-Task (DFAT) network is proposed to jointly learn the attribute model and attractiveness model of single\/multiple modalities. To remedy possible loss of information caused by incomplete manual attributes, we also propose a novel Latent Dual-supervised Feature-Attribute-Task (LDFAT) network, where latent attributes are combined with manual attributes to contribute to the final attractiveness estimation. The extensive experimental evaluations on the collected M<jats:sup>2<\/jats:sup>B dataset well demonstrate the effectiveness of the proposed DFAT and LDFAT networks for female attractiveness prediction.<\/jats:p>","DOI":"10.1145\/2501643.2501650","type":"journal-article","created":{"date-parts":[[2013,8,20]],"date-time":"2013-08-20T14:07:13Z","timestamp":1377007633000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Towards decrypting attractiveness via multi-modality cues"],"prefix":"10.1145","volume":"9","author":[{"given":"Tam V.","family":"Nguyen","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore"}]},{"given":"Si","family":"Liu","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}]},{"given":"Bingbing","family":"Ni","sequence":"additional","affiliation":[{"name":"Advanced Digital Sciences Center, Singapore"}]},{"given":"Jun","family":"Tan","sequence":"additional","affiliation":[{"name":"National University of Defense Technology, Hunan, China"}]},{"given":"Yong","family":"Rui","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing, China"}]},{"given":"Shuicheng","family":"Yan","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2013,8,19]]},"reference":[{"volume-title":"Proceedings of the International Conference on Systems, Man and Cybernetics, 2644--2647","author":"Aarabi P.","key":"e_1_2_1_1_1","unstructured":"Aarabi , P. , Hughes , D. , Mohajer , K. , and Emami , M . 2001. The automatic measurement of facial beauty . In Proceedings of the International Conference on Systems, Man and Cybernetics, 2644--2647 . Aarabi, P., Hughes, D., Mohajer, K., and Emami, M. 2001. The automatic measurement of facial beauty. In Proceedings of the International Conference on Systems, Man and Cybernetics, 2644--2647."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9280.1991.tb00113.x"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1177\/0146167205277097"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.993558"},{"volume-title":"Proceedings of the European Conference on Computer Vision. 663--676","author":"Berg T. L.","key":"e_1_2_1_5_1","unstructured":"Berg , T. L. , Berg , A. C. , and Shih , J . 2010. Automatic attribute discovery and characterization from noisy web data . In Proceedings of the European Conference on Computer Vision. 663--676 . Berg, T. L., Berg, A. C., and Shih, J. 2010. Automatic attribute discovery and characterization from noisy web data. In Proceedings of the European Conference on Computer Vision. 663--676."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126413"},{"key":"e_1_2_1_7_1","unstructured":"Brinton D. 1890. Races and Peoples: Lectures on the Science of Ethnography. N.D.C. Hodges. Brinton D. 1890. Races and Peoples: Lectures on the Science of Ethnography. N.D.C. Hodges."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1995.1004"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1364\/JOSAA.2.001160"},{"key":"e_1_2_1_11_1","first-page":"90","article-title":"What is beautiful is good","volume":"24","author":"Dion K.","year":"1972","unstructured":"Dion , K. , Berscheid , E. , and Walster , E. 1972 . What is beautiful is good . J. Appl. Soc. Psych. 24 , 90 . Dion, K., Berscheid, E., and Walster, E. 1972. What is beautiful is good. J. Appl. Soc. Psych. 24, 90.","journal-title":"J. Appl. Soc. Psych."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976606774841602"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10508-009-9559-6"},{"volume-title":"Proceedings of the European Conference on Computer Vision. 434--447","author":"Gray D.","key":"e_1_2_1_14_1","unstructured":"Gray , D. , Yu , K. , Xu , W. , and Gong , Y . 2010. Predicting facial beauty without landmarks . In Proceedings of the European Conference on Computer Vision. 434--447 . Gray, D., Yu, K., Xu, W., and Gong, Y. 2010. Predicting facial beauty without landmarks. In Proceedings of the European Conference on Computer Vision. 434--447."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1068\/p240937"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 73--79","author":"Guo D.","key":"e_1_2_1_16_1","unstructured":"Guo , D. and Sim , T . 2009. Digital face makeup by example . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 73--79 . Guo, D. and Sim, T. 2009. Digital face makeup by example. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 73--79."},{"volume-title":"Neural Networks","author":"Haykin S.","key":"e_1_2_1_17_1","unstructured":"Haykin , S. 1999. Neural Networks . Prentice Hall . Haykin, S. 1999. Neural Networks. Prentice Hall."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.evolhumbehav.2004.06.001"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775067"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Kagian A. Dror G. Leyvand T. Cohen-Or D. and Ruppin E. 2005. A humanlike predictor of facial attractiveness. Adv. Neural Inf. Process. Sys. 649--656. Kagian A. Dror G. Leyvand T. Cohen-Or D. and Ruppin E. 2005. A humanlike predictor of facial attractiveness. Adv. Neural Inf. Process. Sys. 649--656.","DOI":"10.7551\/mitpress\/7503.003.0086"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88693-8_25"},{"volume-title":"Proceedings of the International Society for Music Information Retrieval Conference. 127--130","author":"Lartillot O.","key":"e_1_2_1_22_1","unstructured":"Lartillot , O. and Toiviainen , P . 2007. MIR in Matlab: A toolbox for musical feature extraction from audio . In Proceedings of the International Society for Music Information Retrieval Conference. 127--130 . Lartillot, O. and Toiviainen, P. 2007. MIR in Matlab: A toolbox for musical feature extraction from audio. In Proceedings of the International Society for Music Information Retrieval Conference. 127--130."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1177\/1077727X9001800403"},{"key":"e_1_2_1_24_1","unstructured":"Likert R. 1932. A technique for the measurement of attitudes. Arch. Psych. 22 140 1--55. Likert R. 1932. A technique for the measurement of attitudes. Arch. Psych. 22 140 1--55."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2393347.2396470"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3330--3337","author":"Liu S.","key":"e_1_2_1_26_1","unstructured":"Liu , S. , Song , Z. , Liu , G. , Xu , C. , Lu , H. , and Yan , S . 2012b. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3330--3337 . Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., and Yan, S. 2012b. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3330--3337."},{"volume-title":"Proceedings of the British Machine Vision Conference. 1--11","author":"Mittal A.","key":"e_1_2_1_27_1","unstructured":"Mittal , A. , Zisserman , A. , and Torr , P. H. S. 2011. Hand detection using multiple proposals . In Proceedings of the British Machine Vision Conference. 1--11 . Mittal, A., Zisserman, A., and Torr, P. H. S. 2011. Hand detection using multiple proposals. In Proceedings of the British Machine Vision Conference. 1--11."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2393347.2393385"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2002.1017623"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011139631724"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.visres.2009.11.003"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126281"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1038\/14819"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126355"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2003.09.011"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000013087.49260.fb"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995741"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01001960"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2501643.2501650","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2501643.2501650","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:28:48Z","timestamp":1750231728000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2501643.2501650"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,8]]}},"alternative-id":["10.1145\/2501643.2501650"],"URL":"https:\/\/doi.org\/10.1145\/2501643.2501650","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2013,8]]},"assertion":[{"value":"2012-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-08-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}