{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T21:00:26Z","timestamp":1770757226566,"version":"3.50.0"},"reference-count":60,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2020,12,1]]},"DOI":"10.1587\/transinf.2020edp7056","type":"journal-article","created":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T22:19:42Z","timestamp":1606774782000},"page":"2578-2589","source":"Crossref","is-referenced-by-count":4,"title":["Predicting Violence Rating Based on Pairwise Comparison"],"prefix":"10.1587","volume":"E103.D","author":[{"given":"Ying","family":"JI","sequence":"first","affiliation":[{"name":"Graduate School of Informatics, Nagoya University"}]},{"given":"Yu","family":"WANG","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ritsumeikan University"}]},{"given":"Jien","family":"KATO","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ritsumeikan University"}]},{"given":"Kensaku","family":"MORI","sequence":"additional","affiliation":[{"name":"Graduate School of Informatics, Nagoya University"}]}],"member":"532","reference":[{"key":"1","unstructured":"[1] V. Rideout, \u201cThe common sense census: Media use by kids age zero to eight,\u201d San Francisco, CA: Common Sense Media, pp.263-283, 2017."},{"key":"2","unstructured":"[2] V.J. Rideout, The common sense census: Media use by tweens and teens, Common Sense Media Incorporated, 2015."},{"key":"3","doi-asserted-by":"publisher","unstructured":"[3] C.A. Anderson, B.J. Bushman, B.D. Bartholow, J. Cantor, D.Christakis, S.M. Coyne, E. Donnerstein, J.F. Brockmyer, D.A.Gentile, C.S. Green, R. Huesmann, T. Hummer, B. Krahe, V.C. Strasburger, W. Warburton, B.J. Wilson, and M. Ybarra, \u201cScreen violence and youth behavior,\u201d Pediatrics, vol.140, no.Supplement 2, pp.S142-S147, 2017. 10.1542\/peds.2016-1758t","DOI":"10.1542\/peds.2016-1758T"},{"key":"4","doi-asserted-by":"publisher","unstructured":"[4] L.D. Eron, L.R. Huesmann, M.M. Lefkowitz, and L.O. Walder, \u201cDoes television violence cause aggression?,\u201d American Psychologist, vol.27, no.4, pp.253-263, 1972. 10.1037\/h0033721","DOI":"10.1037\/h0033721"},{"key":"5","doi-asserted-by":"publisher","unstructured":"[5] C.A. Anderson, A. Sakamoto, D.A. Gentile, N. Ihori, A. Shibuya, S. Yukawa, M. Naito, and K. Kobayashi, \u201cLongitudinal effects of violent video games on aggression in japan and the united states,\u201d Pediatrics, vol.122, no.5, pp.e1067-e1072, 2008. 10.1542\/peds.2008-1425","DOI":"10.1542\/peds.2008-1425"},{"key":"6","doi-asserted-by":"publisher","unstructured":"[6] L.R. Huesmann, J. Moise-Titus, C.L. Podolski, and L.D. Eron, \u201cLongitudinal relations between children&apos;s exposure to tv violence and their aggressive and violent behavior in young adulthood: 1977-1992,\u201d Developmental psychology, vol.39, no.2, pp.201-221, 2003. 10.1037\/0012-1649.39.2.201","DOI":"10.1037\/0012-1649.39.2.201"},{"key":"7","doi-asserted-by":"publisher","unstructured":"[7] S. Benini, L. Canini, and R. Leonardi, \u201cA connotative space for supporting movie affective recommendation,\u201d IEEE Trans. Multimedia, vol.13, no.6, pp.1356-1370, 2011. 10.1109\/tmm.2011.2163058","DOI":"10.1109\/TMM.2011.2163058"},{"key":"8","doi-asserted-by":"publisher","unstructured":"[8] Y. Zhou, J. Wu, T.H. Chan, S.-W. Ho, D.-M. Chiu, and D. Wu, \u201cInterpreting video recommendation mechanisms by mining view count traces,\u201d IEEE Trans. Multimedia, vol.20, no.8, pp.2153-2165, 2017. 10.1109\/tmm.2017.2781364","DOI":"10.1109\/TMM.2017.2781364"},{"key":"9","unstructured":"[9] A.B. Jelodar, D. Paulius, and Y. Sun, \u201cLong activity video understanding using functional object-oriented network,\u201d IEEE Trans. Multimedia, vol.21, no.7, pp.1813-1824, 2018."},{"key":"10","unstructured":"[10] M.Y. Chen and A. Hauptmann, \u201cMosift: Recognizing human actions in surveillance videos,\u201d Tech. Rep., CMU-CS-09-161, Carnegie Mellon University, 2009."},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] F.D. De Souza, G.C. Ch\u00e1vez, E.A. do Valle Jr., and A.d.A. Ara\u00fajo, \u201cViolence detection in video using spatio-temporal features,\u201d 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp.224-230, IEEE, 2010. 10.1109\/sibgrapi.2010.38","DOI":"10.1109\/SIBGRAPI.2010.38"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] I. Laptev, \u201cOn space-time interest points,\u201d International journal of computer vision, vol.64, no.2-3, pp.107-123, 2005. 10.1007\/s11263-005-1838-7","DOI":"10.1007\/s11263-005-1838-7"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] T. Hassner, Y. Itcher, and O. Kliper-Gross, \u201cViolent flows: Real-time detection of violent crowd behavior,\u201d 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.1-6, IEEE, 2012. 10.1109\/cvprw.2012.6239348","DOI":"10.1109\/CVPRW.2012.6239348"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] J. Yang, Y.-G. Jiang, A.G. Hauptmann, and C.-W. Ngo, \u201cEvaluating bag-of-visual-words representations in scene classification,\u201d Proc. international workshop on Workshop on multimedia information retrieval, pp.197-206, ACM, 2007. 10.1145\/1290082.1290111","DOI":"10.1145\/1290082.1290111"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] J. Lin and W. Wang, \u201cWeakly-supervised violence detection in movies with audio and video based co-training,\u201d Pacific-Rim Conference on Multimedia, pp.930-935, Springer, 2009. 10.1007\/978-3-642-10467-1_84","DOI":"10.1007\/978-3-642-10467-1_84"},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] N. Derbas and G. Qu\u00e9not, \u201cJoint audio-visual words for violent scenes detection in movies,\u201d Proc. International Conference on Multimedia Retrieval, pp.483-486, ACM, 2014. 10.1145\/2578726.2578799","DOI":"10.1145\/2578726.2578799"},{"key":"17","doi-asserted-by":"publisher","unstructured":"[17] W. Hu, X. Ding, B. Li, J. Wang, Y. Gao, F. Wang, and S. Maybank, \u201cMulti-perspective cost-sensitive context-aware multi-instance sparse coding and its application to sensitive video recognition,\u201d IEEE Trans. Multimedia, vol.18, no.1, pp.76-89, 2015. 10.1109\/tmm.2015.2496372","DOI":"10.1109\/TMM.2015.2496372"},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] E.B. Nievas, O.D. Suarez, G.B. Garc\u00eda, and R. Sukthankar, \u201cViolence detection in video using computer vision techniques,\u201d International conference on Computer analysis of images and patterns, pp.332-339, Springer, 2011. 10.1007\/978-3-642-23678-5_39","DOI":"10.1007\/978-3-642-23678-5_39"},{"key":"19","doi-asserted-by":"crossref","unstructured":"[19] Y. Ji, Y. Wang, and J. Katoy, \u201cVisual violence rating with pairwise comparison,\u201d 2019 IEEE International Conference on Image Processing (ICIP), pp.3332-3336, IEEE, 2019. 10.1109\/icip.2019.8803573","DOI":"10.1109\/ICIP.2019.8803573"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] M. Schedl, M. Sj\u00f6berg, I. Mironica, B. Ionescu, V.L. Quang, and Y.G. Jiang, \u201cVsd2014: a dataset for violent scenes detection in hollywood movies and web videos,\u201d Sixth Sense, vol.6, no.2.00, pp.12-40, 2015.","DOI":"10.1109\/CBMI.2015.7153604"},{"key":"21","doi-asserted-by":"publisher","unstructured":"[21] L. Grealy, C. Driscoll, and K. Cather, \u201cA history of age-based film classification in japan,\u201d Japan Forum, pp.1-26, Taylor &amp; Francis, 2020. 10.1080\/09555803.2020.1778058","DOI":"10.1080\/09555803.2020.1778058"},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] D.A. Gentile, \u201cThe rating systems for media products,\u201d Handbook of children, media, and development, pp.527-551, 2008. 10.1002\/9781444302752.ch23","DOI":"10.1002\/9781444302752.ch23"},{"key":"23","doi-asserted-by":"publisher","unstructured":"[23] L. Jenkins, T. Webb, N. Browne, A.A. Afifi, and J. Kraus, \u201cAn evaluation of the motion picture association of america&apos;s treatment of violence in pg-, pg-13-, and r-rated films,\u201d Pediatrics, vol.115, no.5, pp.e512-e517, 2005. 10.1542\/peds.2004-1977","DOI":"10.1542\/peds.2004-1977"},{"key":"24","doi-asserted-by":"publisher","unstructured":"[24] Z. Tan and Y. Zhang, \u201cPredicting the top-n popular videos via a cross-domain hybrid model,\u201d IEEE Trans. Multimedia, vol.21, no.1, pp.147-156, 2018. 10.1109\/tmm.2018.2845688","DOI":"10.1109\/TMM.2018.2845688"},{"key":"25","doi-asserted-by":"publisher","unstructured":"[25] J. Lee and J.-S. Lee, \u201cMusic popularity: Metrics, characteristics, and audio-based prediction,\u201d IEEE Trans. Multimedia, vol.20, no.11, pp.3173-3182, 2018. 10.1109\/tmm.2018.2820903","DOI":"10.1109\/TMM.2018.2820903"},{"key":"26","doi-asserted-by":"publisher","unstructured":"[26] T. Trzci\u0144ski and P. Rokita, \u201cPredicting popularity of online videos using support vector regression,\u201d IEEE Trans. Multimedia, vol.19, no.11, pp.2561-2570, 2017. 10.1109\/tmm.2017.2695439","DOI":"10.1109\/TMM.2017.2695439"},{"key":"27","unstructured":"[27] X. Zhang, X. Gao, W. Lu, and L. He, \u201cA gated peripheral-foveal convolutional neural network for unified image aesthetic prediction,\u201d IEEE Transactions on Multimedia, vol.21, no.11, pp.2815-2826, 2019."},{"key":"28","doi-asserted-by":"crossref","unstructured":"[28] J. Donahue and K. Grauman, \u201cAnnotator rationales for visual recognition,\u201d 2011 International Conference on Computer Vision, pp.1395-1402, IEEE, 2011. 10.1109\/iccv.2011.6126394","DOI":"10.1109\/ICCV.2011.6126394"},{"key":"29","doi-asserted-by":"crossref","unstructured":"[29] R. Datta, D. Joshi, J. Li, and J.Z. Wang, \u201cStudying aesthetics in photographic images using a computational approach,\u201d European conference on computer vision, pp.288-301, Springer, 2006. 10.1007\/11744078_23","DOI":"10.1007\/11744078_23"},{"key":"30","doi-asserted-by":"crossref","unstructured":"[30] N. Murray, L. Marchesotti, and F. Perronnin, \u201cAva: A large-scale database for aesthetic visual analysis,\u201d 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2408-2415, IEEE, 2012. 10.1109\/cvpr.2012.6247954","DOI":"10.1109\/CVPR.2012.6247954"},{"key":"31","doi-asserted-by":"crossref","unstructured":"[31] R. Herbrich, T. Minka, and T. Graepel, \u201cTrueskill: a bayesian skill rating system,\u201d Advances in neural information processing systems, pp.569-576, 2007.","DOI":"10.7551\/mitpress\/7503.003.0076"},{"key":"32","doi-asserted-by":"publisher","unstructured":"[32] D.R. Hunter, \u201cMm algorithms for generalized bradley-terry models,\u201d The annals of statistics, vol.32, no.1, pp.384-406, 2004. 10.1214\/aos\/1079120141","DOI":"10.1214\/aos\/1079120141"},{"key":"33","doi-asserted-by":"crossref","unstructured":"[33] Y.G. Jiang, Y. Wang, R. Feng, X. Xue, Y. Zheng, and H. Yang, \u201cUnderstanding and predicting interestingness of videos,\u201d AAAI, 2013.","DOI":"10.1609\/aaai.v27i1.8457"},{"key":"34","doi-asserted-by":"crossref","unstructured":"[34] A. Dubey, N. Naik, D. Parikh, R. Raskar, and C.A. Hidalgo, \u201cDeep learning the city: Quantifying urban perception at a global scale,\u201d European Conference on Computer Vision, pp.196-212, Springer, 2016. 10.1007\/978-3-319-46448-0_12","DOI":"10.1007\/978-3-319-46448-0_12"},{"key":"35","doi-asserted-by":"crossref","unstructured":"[35] M.H. Kiapour, K. Yamaguchi, A.C. Berg, and T.L. Berg, \u201cHipster wars: Discovering elements of fashion styles,\u201d European conference on computer vision, pp.472-488, Springer, 2014. 10.1007\/978-3-319-10590-1_31","DOI":"10.1007\/978-3-319-10590-1_31"},{"key":"36","doi-asserted-by":"crossref","unstructured":"[36] L. Palmer, A. Bialkowski, G.J. Brostow, J. Ambeck-Madsen, and N. Lavie, \u201cPredicting the perceptual demands of urban driving with video regression,\u201d 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.409-417, IEEE, 2017. 10.1109\/wacv.2017.52","DOI":"10.1109\/WACV.2017.52"},{"key":"37","doi-asserted-by":"crossref","unstructured":"[37] L.-H. Chen, H.-W. Hsu, L.-Y. Wang, and C.-W. Su, \u201cViolence detection in movies,\u201d 2011 Eighth International Conference Computer Graphics, Imaging and Visualization, pp.119-124, IEEE, 2011. 10.1109\/cgiv.2011.14","DOI":"10.1109\/CGIV.2011.14"},{"key":"38","doi-asserted-by":"crossref","unstructured":"[38] T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, and S. Theodoridis, \u201cAudio-visual fusion for detecting violent scenes in videos,\u201d Hellenic conference on artificial intelligence, pp.91-100, Springer, 2010. 10.1007\/978-3-642-12842-4_13","DOI":"10.1007\/978-3-642-12842-4_13"},{"key":"39","doi-asserted-by":"crossref","unstructured":"[39] A. Datta, M. Shah, and N.D.V. Lobo, \u201cPerson-on-person violence detection in video data,\u201d Object recognition supported by user interaction for service robots, pp.433-438, IEEE, 2002. 10.1109\/icpr.2002.1044748","DOI":"10.1109\/ICPR.2002.1044748"},{"key":"40","doi-asserted-by":"crossref","unstructured":"[40] H. Wang and C. Schmid, \u201cAction recognition with improved trajectories,\u201d Proc. IEEE international conference on computer vision, pp.3551-3558, 2013. 10.1109\/iccv.2013.441","DOI":"10.1109\/ICCV.2013.441"},{"key":"41","unstructured":"[41] D. Cast\u00e1n, M. Rodr\u00edguez, A. Ortega, C. Orrite, and E. Lleida, \u201cVivolab and cvlab-mediaeval 2014: Violent scenes detection affect task,\u201d MediaEval, 2014."},{"key":"42","doi-asserted-by":"publisher","unstructured":"[42] V. Lam, S. Phan, D.-D. Le, D.A. Duong, and S. Satoh, \u201cEvaluation of multiple features for violent scenes detection,\u201d Multimedia Tools and Applications, vol.76, no.5, pp.7041-7065, 2017. 10.1007\/s11042-016-3331-4","DOI":"10.1007\/s11042-016-3331-4"},{"key":"43","unstructured":"[43] Q. Dai, R.W. Zhao, Z. Wu, X. Wang, Z. Gu, W. Wu, and Y.G. Jiang, \u201cFudan-huawei at mediaeval 2015: Detecting violent scenes and affective impact in movies with deep learning,\u201d MediaEval, 2015."},{"key":"44","unstructured":"[44] Q. Dai, Z. Wu, Y.G. Jiang, X. Xue, and J. Tang, \u201cFudan-njust at mediaeval 2014: Violent scenes detection using deep neural networks,\u201d MediaEval, 2014."},{"key":"45","unstructured":"[45] O. Seddati, E. Kulah, G. Pironkov, S. Dupont, S. Mahmoudi, and T. Dutoit, \u201cUmons at mediaeval 2015 affective impact of movies task including violent scenes detection,\u201d MediaEval, 2015."},{"key":"46","doi-asserted-by":"crossref","unstructured":"[46] X. Li, Y. Huo, Q. Jin, and J. Xu, \u201cDetecting violence in video using subclasses,\u201d Proc. 24th ACM international conference on Multimedia, pp.586-590, ACM, 2016. 10.1145\/2964284.2967289","DOI":"10.1145\/2964284.2967289"},{"key":"47","doi-asserted-by":"crossref","unstructured":"[47] Z. Dong, J. Qin, and Y. Wang, \u201cMulti-stream deep networks for person to person violence detection in videos,\u201d Chinese Conference on Pattern Recognition, pp.517-531, Springer, 2016. 10.1007\/978-981-10-3002-4_43","DOI":"10.1007\/978-981-10-3002-4_43"},{"key":"48","doi-asserted-by":"crossref","unstructured":"[48] A. Hanson, K. Pnvr, S. Krishnagopal, and L. Davis, \u201cBidirectional convolutional lstm for the detection of violence in videos,\u201d Proc. European Conference on Computer Vision (ECCV), pp.280-295, 2018. 10.1007\/978-3-030-11012-3_24","DOI":"10.1007\/978-3-030-11012-3_24"},{"key":"49","doi-asserted-by":"crossref","unstructured":"[49] E. Apostolidis and V. Mezaris, \u201cFast shot segmentation combining global and local visual descriptors,\u201d 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6583-6587, IEEE, 2014. 10.1109\/icassp.2014.6854873","DOI":"10.1109\/ICASSP.2014.6854873"},{"key":"50","unstructured":"[50] K. Simonyan and A. Zisserman, \u201cTwo-stream convolutional networks for action recognition in videos,\u201d Advances in neural information processing systems, pp.568-576, 2014."},{"key":"51","doi-asserted-by":"crossref","unstructured":"[51] D. Parikh and K. Grauman, \u201cRelative attributes,\u201d 2011 IEEE International Conference on Computer Vision (ICCV), pp.503-510, IEEE, 2011. 10.1109\/iccv.2011.6126281","DOI":"10.1109\/ICCV.2011.6126281"},{"key":"52","doi-asserted-by":"crossref","unstructured":"[52] T. Joachims, \u201cOptimizing search engines using clickthrough data,\u201d Proc. eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.133-142, ACM, 2002. 10.1145\/775047.775067","DOI":"10.1145\/775047.775067"},{"key":"53","unstructured":"[53] A. Krizhevsky, I. Sutskever, and G.E. Hinton, \u201cImagenet classification with deep convolutional neural networks,\u201d Advances in neural information processing systems, pp.1097-1105, 2012."},{"key":"54","unstructured":"[54] K. Simonyan and A. Zisserman, \u201cVery deep convolutional networks for large-scale image recognition,\u201d arXiv preprint arXiv:1409.1556, 2014."},{"key":"55","doi-asserted-by":"crossref","unstructured":"[55] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDeep residual learning for image recognition,\u201d Proc. IEEE conference on computer vision and pattern recognition, pp.770-778, 2016. 10.1109\/cvpr.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"56","doi-asserted-by":"crossref","unstructured":"[56] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, \u201cTemporal segment networks: Towards good practices for deep action recognition,\u201d European conference on computer vision, pp.20-36, Springer, 2016. 10.1007\/978-3-319-46484-8_2","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"57","doi-asserted-by":"crossref","unstructured":"[57] Y. Wang, J. Song, L. Wang, L. Van Gool, and O. Hilliges, \u201cTwo-stream sr-cnns for action recognition in videos,\u201d BMVC, 2016. 10.5244\/c.30.108","DOI":"10.5244\/C.30.108"},{"key":"58","unstructured":"[58] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, \u201cDecaf: A deep convolutional activation feature for generic visual recognition,\u201d International conference on machine learning, pp.647-655, 2014."},{"key":"59","unstructured":"[59] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva,\u201cLearning deep features for scene recognition using places database,\u201d Advances in neural information processing systems, pp.487-495, 2014."},{"key":"60","doi-asserted-by":"crossref","unstructured":"[60] M.D. Zeiler and R. Fergus, \u201cVisualizing and understanding convolutional networks,\u201d European conference on computer vision, pp.818-833, Springer, 2014. 10.1007\/978-3-319-10590-1_53","DOI":"10.1007\/978-3-319-10590-1_53"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E103.D\/12\/E103.D_2020EDP7056\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,18]],"date-time":"2024-08-18T08:51:12Z","timestamp":1723971072000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E103.D\/12\/E103.D_2020EDP7056\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,1]]},"references-count":60,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2020]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2020edp7056","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,1]]}}}