{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:22:05Z","timestamp":1750220525483,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2021,7,26]],"date-time":"2021-07-26T00:00:00Z","timestamp":1627257600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2021,8]]},"abstract":"<jats:p>Using clever video curation and processing practices to extract video training signals automatically.<\/jats:p>","DOI":"10.1145\/3431283","type":"journal-article","created":{"date-parts":[[2021,7,26]],"date-time":"2021-07-26T16:09:42Z","timestamp":1627315782000},"page":"69-79","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Unveiling unexpected training data in internet video"],"prefix":"10.1145","volume":"64","author":[{"given":"Tali","family":"Dekel","sequence":"first","affiliation":[{"name":"Weizmann Institute of Science, Rehovot, Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Noah","family":"Snavely","sequence":"additional","affiliation":[{"name":"Cornell Tech in the Cornell Graphics and Vision Group, New York, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,7,26]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Abu-El-Haija S. Kothari N. Lee J. Natsev P. Toderici G. Varadarajan B. and Vijayanarasimhan S. YouTube-8m: A large-scale video classification benchmark (2016); arXiv:1609.08675.  Abu-El-Haija S. Kothari N. Lee J. Natsev P. Toderici G. Varadarajan B. and Vijayanarasimhan S. YouTube-8m: A large-scale video classification benchmark (2016); arXiv:1609.08675."},{"volume-title":"Proc. of the 2015 Int. Conf. on Computer Vision, 37--45","author":"Agrawal P.","key":"e_1_2_1_2_1","unstructured":"Agrawal , P. , Carreira , J. , and Malik , J . Learning to see by moving . In Proc. of the 2015 Int. Conf. on Computer Vision, 37--45 . Agrawal, P., Carreira, J., and Malik, J. Learning to see by moving. In Proc. of the 2015 Int. Conf. on Computer Vision, 37--45."},{"volume-title":"Proc. of the 2017 Int. Conf. on ComputerVision, 609-617","author":"Arandjelovic R.","key":"e_1_2_1_3_1","unstructured":"Arandjelovic , R. and Zisserman , A . Look, listen and learn . In Proc. of the 2017 Int. Conf. on ComputerVision, 609-617 . Arandjelovic, R. and Zisserman, A. Look, listen and learn. In Proc. of the 2017 Int. Conf. on ComputerVision, 609-617."},{"volume-title":"Proc. of the 2018 European Conf. on Computer Vision, 435--451","author":"Arandjelovic R.","key":"e_1_2_1_4_1","unstructured":"Arandjelovic , R. and Zisserman , A . Objects that sound . In Proc. of the 2018 European Conf. on Computer Vision, 435--451 . Arandjelovic, R. and Zisserman, A. Objects that sound. In Proc. of the 2018 European Conf. on Computer Vision, 435--451."},{"key":"e_1_2_1_5_1","volume-title":"SoundNet: Learning sound representations from unlabeled video. Neural Information Processing Systems","author":"Aytar Y.","year":"2016","unstructured":"Aytar , Y. , Vondrick , C. , and Torralba , A . SoundNet: Learning sound representations from unlabeled video. Neural Information Processing Systems ( 2016 ), 892--900. Aytar, Y., Vondrick, C., and Torralba, A. SoundNet: Learning sound representations from unlabeled video. Neural Information Processing Systems (2016), 892--900."},{"key":"e_1_2_1_6_1","volume":"2018","author":"Caelles S.","unstructured":"Caelles , S. , Montes , A. , Maninis , K-K. , Chen , Y. , Van Gool , L. , Perazzi , F. , and Pont-Tuset , J. The 2018 DAVIS Challenge on Video Object Segmentation; arXiv:1803.00557. Caelles, S., Montes, A., Maninis, K-K., Chen, Y., Van Gool, L., Perazzi, F., and Pont-Tuset, J. The 2018 DAVIS Challenge on Video Object Segmentation; arXiv:1803.00557.","journal-title":"J. The"},{"volume-title":"Proc. of the 2016 Conf. Computer Vision and Pattern Recognition.","author":"Castrejon L.","key":"e_1_2_1_7_1","unstructured":"Castrejon , L. , Aytar , Y. , Vondrick , C. , Pirsiavash , H. , and Torralba , A . Learning aligned cross-modal representations from weakly aligned data . In Proc. of the 2016 Conf. Computer Vision and Pattern Recognition. Castrejon, L., Aytar, Y., Vondrick, C., Pirsiavash, H., and Torralba, A. Learning aligned cross-modal representations from weakly aligned data. In Proc. of the 2016 Conf. Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_8_1","volume-title":"Single-image depth perception in the wild. Neural Information Processing Systems","author":"Chen W.","year":"2016","unstructured":"Chen , W. , Fu , Z. , Yang , D. , and Deng , J . Single-image depth perception in the wild. Neural Information Processing Systems ( 2016 ), 730--738. Chen, W., Fu, Z., Yang, D., and Deng, J. Single-image depth perception in the wild. Neural Information Processing Systems (2016), 730--738."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1907229"},{"volume-title":"Proc. of the 2017 Conf. Computer Vision and Pattern Recognition.","author":"Cole F.","key":"e_1_2_1_10_1","unstructured":"Cole , F. , Belanger , D. , Krishnan , D. , Sarna , A. , Mosseri , I. , and Freeman , W.T . Synthesizing normalized faces from facial identity features . In Proc. of the 2017 Conf. Computer Vision and Pattern Recognition. Cole, F., Belanger, D., Krishnan, D., Sarna, A., Mosseri, I., and Freeman, W.T. Synthesizing normalized faces from facial identity features. In Proc. of the 2017 Conf. Computer Vision and Pattern Recognition."},{"volume-title":"Proc. of the 2019 Conf. Computer Vision and Pattern Recognition.","author":"Dwibedi D.","key":"e_1_2_1_11_1","unstructured":"Dwibedi , D. , Aytar , Y. , Tompson , J. , Sermanet , P. , and Zisserman , A . Temporal cycle-consistency learning . In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., and Zisserman, A. Temporal cycle-consistency learning. In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_12_1","volume-title":"Depth map prediction from a single image using a multi-scale deep network. Neural Information Processing Systems","author":"Eigen D.","year":"2014","unstructured":"Eigen , D. , Puhrsch , C. , and Fergus , R . Depth map prediction from a single image using a multi-scale deep network. Neural Information Processing Systems ( 2014 ), 2366--2374. Eigen, D., Puhrsch, C., and Fergus, R. Depth map prediction from a single image using a multi-scale deep network. Neural Information Processing Systems (2014), 2366--2374."},{"key":"e_1_2_1_13_1","volume-title":"Flash photography enhancement via intrinsic relighting. ACM Trans. Graphics","author":"Eisemann E.","year":"2004","unstructured":"Eisemann , E. and Durand , F . Flash photography enhancement via intrinsic relighting. ACM Trans. Graphics ( 2004 ). Eisemann, E. and Durand, F. Flash photography enhancement via intrinsic relighting. ACM Trans. Graphics (2004)."},{"key":"e_1_2_1_14_1","unstructured":"Ephrat A. Mosseri I. Lang O. Dekel T. Wilson K. Hassidim A. Freeman W.T. and Rubinstein M. AVSpeech Dataset (2018); https:\/\/looking-to-listen.github.io\/avspeech\/.  Ephrat A. Mosseri I. Lang O. Dekel T. Wilson K. Hassidim A. Freeman W.T. and Rubinstein M. AVSpeech Dataset (2018); https:\/\/looking-to-listen.github.io\/avspeech\/."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201357"},{"volume-title":"Proc. of the 2017 Computer Vision and Pattern Recognition, 3636--3645","author":"Fernando B.","key":"e_1_2_1_16_1","unstructured":"Fernando , B. , Bilen , H. , Gavves , E. , and Gould , S . Self-supervised video representation learning with odd-one-out networks . In Proc. of the 2017 Computer Vision and Pattern Recognition, 3636--3645 . Fernando, B., Bilen, H., Gavves, E., and Gould, S. Self-supervised video representation learning with odd-one-out networks. In Proc. of the 2017 Computer Vision and Pattern Recognition, 3636--3645."},{"volume-title":"Proc. of the 2016 Conf. Computer Vision and Pattern Recognition.","author":"Flynn J.","key":"e_1_2_1_17_1","unstructured":"Flynn , J. , Neulander , I. , Philbin , J. , and Snavely , N . DeepStereo: Learning to predict new views from the world's imagery . In Proc. of the 2016 Conf. Computer Vision and Pattern Recognition. Flynn, J., Neulander, I., Philbin, J., and Snavely, N. DeepStereo: Learning to predict new views from the world's imagery. In Proc. of the 2016 Conf. Computer Vision and Pattern Recognition."},{"volume-title":"Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 4991--5000","author":"Fouhey D.F.","key":"e_1_2_1_18_1","unstructured":"Fouhey , D.F. , Kuo , W. , Efros , A.A. , and Malik , J . From lifestyle vlogs to everyday interactions . In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 4991--5000 . Fouhey, D.F., Kuo, W., Efros, A.A., and Malik, J. From lifestyle vlogs to everyday interactions. In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 4991--5000."},{"volume-title":"Proc. of the 2018 Conf. Computer Vision and Pattern Recognition.","author":"Fu H.","key":"e_1_2_1_19_1","unstructured":"Fu , H. , Gong , M. , Wang , C. , Batmanghelich , K. , and Tao , D . Deep ordinal regression network for monocular depth estimation . In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition. Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. Deep ordinal regression network for monocular depth estimation. In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition."},{"volume-title":"Proc. of the 2017 ICASSP, 776--780","author":"Gemmeke J.F.","key":"e_1_2_1_20_1","unstructured":"Gemmeke , J.F. , Ellis , D.P.W. , Freedman , D. , Jansen , A. , Lawrence , W. , Moore , R.C. , Plakal , M. , and Ritter , M . AudioSet: An ontology and human-labeled dataset for audio events . In Proc. of the 2017 ICASSP, 776--780 . Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., and Ritter, M. AudioSet: An ontology and human-labeled dataset for audio events. In Proc. of the 2017 ICASSP, 776--780."},{"volume-title":"Proc. of the 2017 ICASSP.","author":"Gemmeke J.F.","key":"e_1_2_1_21_1","unstructured":"Gemmeke , J.F. , Ellis , D.P.W. , Freedman , D. , Jansen , A. , Lawrence , W. , Moore , R.C. , Plakal , M. , and Ritter , M . AudioSet: An ontology and human-labeled dataset for audio events . In Proc. of the 2017 ICASSP. Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., and Ritter, M. AudioSet: An ontology and human-labeled dataset for audio events. In Proc. of the 2017 ICASSP."},{"volume-title":"Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 6047--6056","author":"Gu C.","key":"e_1_2_1_22_1","unstructured":"Gu , C. et al. AVA: A video dataset of spatio-temporally localized atomic visual actions . In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 6047--6056 . Gu, C. et al. AVA: A video dataset of spatio-temporally localized atomic visual actions. In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 6047--6056."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980251"},{"volume-title":"Proc. of the 2014 Conf. Computer Vision and Pattern Recognition.","author":"Karpathy A.","key":"e_1_2_1_24_1","unstructured":"Karpathy , A. , Toderici , G. , Shetty , S. , Leung , T. , Sukthankar , R. , and Fei-Fei , L . Large-scale video classification with convolutional neural networks . In Proc. of the 2014 Conf. Computer Vision and Pattern Recognition. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. Large-scale video classification with convolutional neural networks. In Proc. of the 2014 Conf. Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_25_1","unstructured":"Kay W. et al. The kinetics human action video dataset (2017); arXiv:1705.06950.  Kay W. et al. The kinetics human action video dataset (2017); arXiv:1705.06950."},{"volume-title":"Proc. 2019 IEEE Conf. Computer Vision and Pattern Recognition","author":"Kwon Y-H","key":"e_1_2_1_26_1","unstructured":"Kwon , Y-H and Park , M-G . Predicting future frames using retrospective cycle GAN . In Proc. 2019 IEEE Conf. Computer Vision and Pattern Recognition , 1811--1820. Kwon, Y-H and Park, M-G. Predicting future frames using retrospective cycle GAN. In Proc. 2019 IEEE Conf. Computer Vision and Pattern Recognition, 1811--1820."},{"key":"e_1_2_1_27_1","unstructured":"Li Z. Dekel T. Cole F. Tucker R. and Snavely N. MannequinChallenge Dataset (2019); https:\/\/google.github.io\/mannequinchallenge\/.  Li Z. Dekel T. Cole F. Tucker R. and Snavely N. MannequinChallenge Dataset (2019); https:\/\/google.github.io\/mannequinchallenge\/."},{"volume-title":"Proc. of the 2019 Conf. Computer Vision and Pattern Recognition, 4521--4530","author":"Li Z.","key":"e_1_2_1_28_1","unstructured":"Li , Z. , Dekel , T. , Cole , F. , Tucker , R. , Snavely , N. , Liu , C. , and Freeman , W.T . Learning the depths of moving people by watching frozen people . In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition, 4521--4530 . Li, Z., Dekel, T., Cole, F., Tucker, R., Snavely, N., Liu, C., and Freeman, W.T. Learning the depths of moving people by watching frozen people. In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition, 4521--4530."},{"volume-title":"Proc. of the 2017 IEEE Intern. Conf. on Computer Vision, 4463--4471","author":"Liu Z.","key":"e_1_2_1_29_1","unstructured":"Liu , Z. , Yeh , R.A. , Tang , X. , Liu , Y. , and Agarwala , A . Video frame synthesis using deep voxel flow . In Proc. of the 2017 IEEE Intern. Conf. on Computer Vision, 4463--4471 . Liu, Z., Yeh, R.A., Tang, X., Liu, Y., and Agarwala, A. Video frame synthesis using deep voxel flow. In Proc. of the 2017 IEEE Intern. Conf. on Computer Vision, 4463--4471."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_32"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2015.2463671"},{"volume-title":"Proc. of the 2018 European Conf. on Computer Vision.","author":"Owens A.","key":"e_1_2_1_32_1","unstructured":"Owens , A. and Efros , A.A . Audio-visual scene analysis with self-supervised multisensory features . In Proc. of the 2018 European Conf. on Computer Vision. Owens, A. and Efros, A.A. Audio-visual scene analysis with self-supervised multisensory features. In Proc. of the 2018 European Conf. on Computer Vision."},{"volume-title":"Proc. of the 2016 Conf. Computer Vision and Pattern Recognition, 2405--2413","author":"Owens A.","key":"e_1_2_1_33_1","unstructured":"Owens , A. , Isola , P. , McDermott , J. , Torralba , A. , Adelson , E.H. , and Freeman , W.T . Visually indicated sounds . In Proc. of the 2016 Conf. Computer Vision and Pattern Recognition, 2405--2413 . Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E.H., and Freeman, W.T. Visually indicated sounds. In Proc. of the 2016 Conf. Computer Vision and Pattern Recognition, 2405--2413."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186562.1015777"},{"key":"e_1_2_1_35_1","unstructured":"Pont-Tuset J. Perazzi F. Caelles S. Arbel\u00e1ez P. Sorkine-Hornung A. and Van Gool L. The 2017 DAVIS challenge on video object segmentation; arXiv:1704.00675.  Pont-Tuset J. Perazzi F. Caelles S. Arbel\u00e1ez P. Sorkine-Hornung A. and Van Gool L. The 2017 DAVIS challenge on video object segmentation; arXiv:1704.00675."},{"volume-title":"Proc. of the 2017 Conf. Computer Vision and Pattern Recognition, 5296--5305","author":"Real E.","key":"e_1_2_1_36_1","unstructured":"Real , E. , Shlens , J. , Mazzocchi , S. , Pan , X. , and Vanhoucke , V . YouTube-bounding boxes: A large high-precision human-annotated data set for object detection in video . In Proc. of the 2017 Conf. Computer Vision and Pattern Recognition, 5296--5305 . Real, E., Shlens, J., Mazzocchi, S., Pan, X., and Vanhoucke, V. YouTube-bounding boxes: A large high-precision human-annotated data set for object detection in video. In Proc. of the 2017 Conf. Computer Vision and Pattern Recognition, 5296--5305."},{"volume-title":"Proc. of the 2018 Conf. Computer Vision and Pattern Recognition.","author":"Senocak A.","key":"e_1_2_1_37_1","unstructured":"Senocak , A. , Oh , T-H. , Kim , J. , Yang , M-H. , and Kweon , I.S . Learning to localize sound source in visual scenes . In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition. Senocak, A., Oh, T-H., Kim, J., Yang, M-H., and Kweon, I.S. Learning to localize sound source in visual scenes. In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition."},{"volume-title":"Proc. of the 2018 IEEE Intern. Conf. Robotics and Automation.","author":"Sermanet P.","key":"e_1_2_1_38_1","unstructured":"Sermanet , P. , Lynch , C. , Chebotar , Y. , Hsu , J. , Jang , E. , Schaal , S. , and Levine , S . Time-contrastive networks: Self-supervised learning from video . In Proc. of the 2018 IEEE Intern. Conf. Robotics and Automation. Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., and Levine, S. Time-contrastive networks: Self-supervised learning from video. In Proc. of the 2018 IEEE Intern. Conf. Robotics and Automation."},{"volume-title":"Proc. of the 2016 European Conf. on Computer Vision. Springer, 900--917","author":"Soler M.","key":"e_1_2_1_39_1","unstructured":"Soler , M. , Bazin , J-C. , Wang , O. , Krause , A. , and Sorkine-Hornung , A . Suggesting sounds for images from video collections . In Proc. of the 2016 European Conf. on Computer Vision. Springer, 900--917 . Soler, M., Bazin, J-C., Wang, O., Krause, A., and Sorkine-Hornung, A. Suggesting sounds for images from video collections. In Proc. of the 2016 European Conf. on Computer Vision. Springer, 900--917."},{"key":"e_1_2_1_40_1","unstructured":"Soomro K. Zamir A.R. and Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild (2012); arXiv:1212.0402.  Soomro K. Zamir A.R. and Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild (2012); arXiv:1212.0402."},{"key":"e_1_2_1_41_1","volume-title":"Int. J. of Computer Vision","author":"Szeliski R.","year":"1999","unstructured":"Szeliski , R. and Golland , P . Stereo matching with transparency and matting . Int. J. of Computer Vision ( 1999 ). Szeliski, R. and Golland, P. Stereo matching with transparency and matting. Int. J. of Computer Vision (1999)."},{"volume-title":"Proc. of the 2017 Conf. Computer Vision and Pattern Recognition.","author":"Ummenhofer B.","key":"e_1_2_1_42_1","unstructured":"Ummenhofer , B. , Zhou , H. , Uhrig , J. , Mayer , N. , Ilg , E. , Dosovitskiy , A. , and Brox , T . DeMoN: Depth and motion network for learning monocular stereo . In Proc. of the 2017 Conf. Computer Vision and Pattern Recognition. Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., and Brox, T. DeMoN: Depth and motion network for learning monocular stereo. In Proc. of the 2017 Conf. Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_43_1","volume-title":"Generating videos with scene dynamics. Advances in Neural Information Processing Systems","author":"Vondrick C.","year":"2016","unstructured":"Vondrick , C. , Pirsiavash , H. , and Torralba , A . Generating videos with scene dynamics. Advances in Neural Information Processing Systems ( 2016 ), 613--621. Vondrick, C., Pirsiavash, H., and Torralba, A. Generating videos with scene dynamics. Advances in Neural Information Processing Systems (2016), 613--621."},{"volume-title":"Proc. of the 2018 European Conf. on Computer Vision, 391--408","author":"Vondrick C.","key":"e_1_2_1_44_1","unstructured":"Vondrick , C. , Shrivastava , A. , Fathi , A. , Guadarrama , S. , and Murphy , K . Tracking emerges by colorizing videos . In Proc. of the 2018 European Conf. on Computer Vision, 391--408 . Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., and Murphy, K. Tracking emerges by colorizing videos. In Proc. of the 2018 European Conf. on Computer Vision, 391--408."},{"volume-title":"Proc. of the 2019 Conf. Computer Vision and Pattern Recognition, 1308--1317","author":"Wang N.","key":"e_1_2_1_45_1","unstructured":"Wang , N. , Song , Y. , Ma , C. , Zhou , W. , Liu , W. , and Li , H . Unsupervised deep tracking . In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition, 1308--1317 . Wang, N., Song, Y., Ma, C., Zhou, W., Liu, W., and Li, H. Unsupervised deep tracking. In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition, 1308--1317."},{"volume-title":"Proc. of the 2019 Conf. Computer Vision and Pattern Recognition.","author":"Wang X.","key":"e_1_2_1_46_1","unstructured":"Wang , X. , Jabri , A. , and Efros , A.A . Learning correspondence from the cycle-consistency of time . In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition. Wang, X., Jabri, A., and Efros, A.A. Learning correspondence from the cycle-consistency of time. In Proc. of the 2019 Conf. Computer Vision and Pattern Recognition."},{"volume-title":"Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 8052--8060","author":"Wei D.","key":"e_1_2_1_47_1","unstructured":"Wei , D. , Lim , J.J. , Zisserman , A. , and Freeman , W.T . Learning and using the arrow of time . In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 8052--8060 . Wei, D., Lim, J.J., Zisserman, A., and Freeman, W.T. Learning and using the arrow of time. In Proc. of the 2018 Conf. Computer Vision and Pattern Recognition, 8052--8060."},{"key":"e_1_2_1_48_1","volume-title":"Multiplane camera","author":"Wikipedia","year":"2017","unstructured":"Wikipedia . Multiplane camera , 2017 ; https:\/\/en.wikipedia.org\/wiki\/Multiplane_camera. Wikipedia. Multiplane camera, 2017; https:\/\/en.wikipedia.org\/wiki\/Multiplane_camera."},{"key":"e_1_2_1_49_1","unstructured":"Wikipedia. Mannequin Challenge 2018; https:\/\/en.wikipedia.org\/wiki\/Mannequin_Challenge.  Wikipedia. Mannequin Challenge 2018; https:\/\/en.wikipedia.org\/wiki\/Mannequin_Challenge."},{"volume-title":"Proc. of the 2018 European Conf. on Computer Vision, 570--586","author":"Zhao H.","key":"e_1_2_1_50_1","unstructured":"Zhao , H. , Gan , C. , Rouditchenko , A. , Vondrick , C. , McDermott , J. , and Torralba , A . The sound of pixels . In Proc. of the 2018 European Conf. on Computer Vision, 570--586 . Zhao, H., Gan, C., Rouditchenko, A., Vondrick, C., McDermott, J., and Torralba, A. The sound of pixels. In Proc. of the 2018 European Conf. on Computer Vision, 570--586."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201323"}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3431283","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3431283","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:46Z","timestamp":1750195486000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3431283"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,26]]},"references-count":51,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2021,8]]}},"alternative-id":["10.1145\/3431283"],"URL":"https:\/\/doi.org\/10.1145\/3431283","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"type":"print","value":"0001-0782"},{"type":"electronic","value":"1557-7317"}],"subject":[],"published":{"date-parts":[[2021,7,26]]},"assertion":[{"value":"2021-07-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}