{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T15:33:22Z","timestamp":1767972802302,"version":"3.49.0"},"reference-count":86,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2019,10,3]],"date-time":"2019-10-03T00:00:00Z","timestamp":1570060800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,10,3]],"date-time":"2019-10-03T00:00:00Z","timestamp":1570060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000287","name":"Royal Academy of Engineering","doi-asserted-by":"crossref","award":["RF-201718-17177"],"award-info":[{"award-number":["RF-201718-17177"]}],"id":[{"id":"10.13039\/501100000287","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/P022529"],"award-info":[{"award-number":["EP\/P022529"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2020,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n<jats:p>Simultaneous semantically coherent object-based long-term 4D scene flow estimation, co-segmentation and reconstruction is proposed exploiting the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance. In this paper we propose a framework for spatially and temporally coherent semantic 4D scene flow of general dynamic scenes from multiple view videos captured with a network of static or moving cameras. Semantic coherence results in improved 4D scene flow estimation, segmentation and reconstruction for complex dynamic scenes. Semantic tracklets are introduced to robustly initialize the scene flow in the joint estimation and enforce temporal coherence in 4D flow, semantic labelling and reconstruction between widely spaced instances of dynamic objects. Tracklets of dynamic objects enable unsupervised learning of long-term flow, appearance and shape priors that are exploited in semantically coherent 4D scene flow estimation, co-segmentation and reconstruction. Comprehensive performance evaluation against state-of-the-art techniques on challenging indoor and outdoor sequences with hand-held moving cameras shows improved accuracy in 4D scene flow, segmentation, temporally coherent semantic labelling, and reconstruction of dynamic scenes.\n<\/jats:p>","DOI":"10.1007\/s11263-019-01241-w","type":"journal-article","created":{"date-parts":[[2019,10,3]],"date-time":"2019-10-03T17:03:36Z","timestamp":1570122216000},"page":"319-335","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Semantically Coherent 4D Scene Flow of Dynamic Scenes"],"prefix":"10.1007","volume":"128","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1779-2775","authenticated-orcid":false,"given":"Armin","family":"Mustafa","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4223-238X","authenticated-orcid":false,"given":"Adrian","family":"Hilton","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,10,3]]},"reference":[{"key":"1241_CR1","unstructured":"4d repository. In Institut national de recherche en informatique et en automatique (INRIA) Rhone Alpes. \nhttp:\/\/4drepository.inrialpes.fr\/\n\n."},{"issue":"4","key":"1241_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1778765.1778824","volume":"29","author":"L Ballan","year":"2010","unstructured":"Ballan, L., Brostow, G. J., Puwein, J., & Pollefeys, M. (2010). Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Transactions on Graphics, 29(4), 1\u201311.","journal-title":"ACM Transactions on Graphics"},{"key":"1241_CR3","unstructured":"Bao, Y., chandraker, M., Lin, Y., & Savarese, S. (2013). Dense object reconstruction using semantic priors. In The IEEE international conference on computer vision and pattern recognition (CVPR)."},{"key":"1241_CR4","doi-asserted-by":"crossref","unstructured":"Basha, T., Moses, Y., Kiryati, N. (2010). Multi-view scene flow estimation: A view centered variational approach. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1506\u20131513).","DOI":"10.1109\/CVPR.2010.5539791"},{"key":"1241_CR5","doi-asserted-by":"crossref","unstructured":"Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). icoseg: Interactive co-segmentation with intelligent scribble guidance. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2010.5540080"},{"issue":"4","key":"1241_CR6","doi-asserted-by":"publisher","first-page":"75:1","DOI":"10.1145\/2010324.1964970","volume":"30","author":"T Beeler","year":"2011","unstructured":"Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., et al. (2011). High-quality passive facial performance capture using anchor frames. ACM Transaction in Graphics, 30(4), 75:1\u201375:10.","journal-title":"ACM Transaction in Graphics"},{"key":"1241_CR7","unstructured":"Behl, A., Jafari, O. H., Mustikovela, S. K., Alhaija, H. A., Rother, C., & Geiger, A. (2017). Bounding boxes, segmentations and object coordinates: How important is recognition for 3d scene flow estimation in autonomous driving scenarios? In Proceedings IEEE international conference on computer vision (ICCV). IEEE."},{"issue":"11","key":"1241_CR8","doi-asserted-by":"publisher","first-page":"1124","DOI":"10.1109\/TPAMI.2004.60","volume":"26","author":"Y Boykov","year":"2004","unstructured":"Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut\/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 26(11), 1124\u20131137.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"issue":"11","key":"1241_CR9","doi-asserted-by":"publisher","first-page":"1222","DOI":"10.1109\/34.969114","volume":"23","author":"Y Boykov","year":"2001","unstructured":"Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 23(11), 1222\u20131239.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"issue":"4","key":"1241_CR10","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"L Chen","year":"2018","unstructured":"Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions in Pattern Analysis and Machine Intelligence (PAMI), 40(4), 834\u2013848.","journal-title":"IEEE Transactions in Pattern Analysis and Machine Intelligence (PAMI)"},{"key":"1241_CR11","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"1241_CR12","unstructured":"Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. CoRR \narXiv:1412.7062\n\n."},{"key":"1241_CR13","unstructured":"Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. CoRR \narXiv:1606.00915\n\n."},{"key":"1241_CR14","doi-asserted-by":"crossref","unstructured":"Chen, P.-Y.,\u00a0Liu, A. H., Wang, Y. C. F. (2019). Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2019.00273"},{"key":"1241_CR15","doi-asserted-by":"crossref","unstructured":"Chiu, W. C., & Fritz, M. (2013). Multi-class video co-segmentation with a generative multi-video model. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2013.48"},{"issue":"9","key":"1241_CR16","doi-asserted-by":"publisher","first-page":"1890","DOI":"10.1109\/TPAMI.2014.2385704","volume":"37","author":"A Djelouah","year":"2015","unstructured":"Djelouah, A., Franco, J. S., Boyer, E., Le Clerc, F., & Perez, P. (2015). Sparse multi-view consistency for object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(9), 1890\u20131903.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"1241_CR17","doi-asserted-by":"crossref","unstructured":"Djelouah, A., Franco, J. S., Boyer, E., P\u00e9rez, P., & Drettakis, G. (2016). Cotemporal multi-view video segmentation. In International conference on 3D vision (3DV).","DOI":"10.1109\/3DV.2016.45"},{"key":"1241_CR18","doi-asserted-by":"crossref","unstructured":"Engelmann, F., St\u00fcckler, J., & Leibe, B.(2016). Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In Proceedings of the German Conference on Pattern Recognition (GCPR).","DOI":"10.1007\/978-3-319-45886-1_18"},{"issue":"10","key":"1241_CR19","doi-asserted-by":"publisher","first-page":"1858","DOI":"10.1109\/TPAMI.2008.113","volume":"30","author":"GD Evangelidis","year":"2008","unstructured":"Evangelidis, G. D., & Psarakis, E. Z. (2008). Parametric image alignment using enhanced correlation coefficient maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 30(10), 1858\u20131865.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"1241_CR20","unstructured":"Everingham, M., Van\u00a0Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2012). The PASCAL visual object classes challenge (VOC2012) results. Retrieved September 5, 2017 from \nhttp:\/\/www.pascal-network.org\/challenges\/VOC\/voc2012\/workshop\/index.html\n\n."},{"issue":"8","key":"1241_CR21","doi-asserted-by":"publisher","first-page":"1915","DOI":"10.1109\/TPAMI.2012.231","volume":"35","author":"C Farabet","year":"2013","unstructured":"Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(8), 1915\u20131929.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"1241_CR22","doi-asserted-by":"crossref","unstructured":"Floros, G., & Leibe, B. (2012). Joint 2d-3d temporally consistent semantic segmentation of street scenes. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2823\u20132830).","DOI":"10.1109\/CVPR.2012.6248007"},{"key":"1241_CR23","doi-asserted-by":"crossref","unstructured":"Fu, H., Xu, D., Zhang, B., & Lin, S. (2014). Object-based multiple foreground video co-segmentation. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2014.405"},{"key":"1241_CR24","doi-asserted-by":"crossref","unstructured":"Goldluecke, B., & Magnor, M. (2004). Space\u2013time isosurface evolution for temporally coherent 3d reconstruction. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 350\u2013355).","DOI":"10.1109\/CVPR.2004.1315053"},{"key":"1241_CR25","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., & Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation (pp. 345\u2013360).","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"1241_CR26","doi-asserted-by":"crossref","unstructured":"Hane, C., Zach, C., Cohen, A., & Pollefeys, M. (2013). Joint 3d scene reconstruction and class segmentation. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2013.20"},{"key":"1241_CR27","doi-asserted-by":"publisher","first-page":"1730","DOI":"10.1109\/TPAMI.2016.2613051","volume":"39","author":"C Hane","year":"2016","unstructured":"Hane, C., Zach, C., Cohen, A., & Pollefeys, M. (2016). Dense semantic 3d reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39, 1730\u20131743.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"1241_CR28","doi-asserted-by":"crossref","unstructured":"Hariharan, B., Arbel\u00e1ez, P. A., Girshick, R. B., & Malik, J. (2015). Hypercolumns for object segmentation and fine-grained localization. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 447\u2013456).","DOI":"10.1109\/CVPR.2015.7298642"},{"key":"1241_CR29","volume-title":"Multiple view geometry in computer vision","author":"R Hartley","year":"2003","unstructured":"Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.","edition":"2"},{"key":"1241_CR30","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., & Girshick, R. B. (2017). Mask R-CNN. CoRR \narXiv:1703.06870\n\n."},{"issue":"8","key":"1241_CR31","first-page":"2121","volume":"34","author":"X Hu","year":"2012","unstructured":"Hu, X., & Mordohai, P. (2012). A quantitative evaluation of confidence measures for stereo vision. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(8), 2121\u20132133.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"issue":"7","key":"1241_CR32","doi-asserted-by":"publisher","first-page":"1325","DOI":"10.1109\/TPAMI.2013.248","volume":"36","author":"C Ionescu","year":"2014","unstructured":"Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325\u20131339.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1241_CR33","doi-asserted-by":"crossref","unstructured":"Jiao, J., Cao, Y., Song, Y., & Lau, R. (2018). Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In The European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-030-01267-0_4"},{"key":"1241_CR34","doi-asserted-by":"crossref","unstructured":"Joulin, A., Bach, F., & Ponce, J. (2012). Multi-class cosegmentation. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2012.6247719"},{"key":"1241_CR35","unstructured":"Kazhdan, M., Bolitho, M., & Hoppe, H. (2006). Poisson surface reconstruction. In Eurographics symposium on geometry processing (pp. 61\u201370)."},{"key":"1241_CR36","unstructured":"Kendall, A., Gal, Y., & Cipolla, R. (2017). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. CoRR \narXiv:1705.07115\n\n."},{"key":"1241_CR37","doi-asserted-by":"publisher","first-page":"1175","DOI":"10.1007\/s11263-019-01164-6","volume":"127","author":"A Khoreva","year":"2019","unstructured":"Khoreva, A., Benenson, R., Ilg, E., Brox, T., & Schiele, B. (2019). Lucid data dreaming for video object segmentation. International Journal of Computer Vision (IJCV), 127, 1175\u20131197.","journal-title":"International Journal of Computer Vision (IJCV)"},{"issue":"11","key":"1241_CR38","doi-asserted-by":"publisher","first-page":"1611","DOI":"10.1109\/TCSVT.2012.2202185","volume":"22","author":"H Kim","year":"2012","unstructured":"Kim, H., Guillemaut, J., Takai, T., Sarim, M., & Hilton, A. (2012). Outdoor dynamic 3-D scene reconstruction. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 22(11), 1611\u20131622.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)"},{"issue":"3","key":"1241_CR39","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1109\/TPAMI.2011.150","volume":"34","author":"K Kolev","year":"2012","unstructured":"Kolev, K., Brox, T., & Cremers, D. (2012). Fast joint estimation of silhouettes and dense 3d geometry from multiple images. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(3), 493\u2013505.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"1241_CR40","first-page":"703","volume":"8694","author":"A Kundu","year":"2014","unstructured":"Kundu, A., Li, Y., Dellaert, F., Li, F., & Rehg, J. M. (2014). Joint semantic segmentation and 3d reconstruction from monocular video. European Conference on Computer Vision (ECCV), 8694, 703\u2013718.","journal-title":"European Conference on Computer Vision (ECCV)"},{"key":"1241_CR41","doi-asserted-by":"crossref","unstructured":"Kundu, A., Vineet, V., & Koltun, V. (2016). Feature space optimization for semantic video segmentation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3168\u20133175).","DOI":"10.1109\/CVPR.2016.345"},{"key":"1241_CR42","doi-asserted-by":"crossref","unstructured":"Langguth, F., Sunkavalli, K., Hadap, S., & Goesele, M. (2016). Shading-aware multi-view stereo. In European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-319-46487-9_29"},{"key":"1241_CR43","doi-asserted-by":"crossref","unstructured":"Larsen, E., Mordohai, P., Pollefeys, M., & Fuchs, H. (2007). Temporally consistent reconstruction from multiple video streams using enhanced belief propagation. In The IEEE international conference on computer vision (ICCV) (pp. 1\u20138).","DOI":"10.1109\/ICCV.2007.4409013"},{"key":"1241_CR44","doi-asserted-by":"crossref","unstructured":"Li, P., Qin, T., & Shen, S. (2018). Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving. In The European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-030-01216-8_40"},{"key":"1241_CR45","unstructured":"Lin, T. Y., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., et al. (2014). Microsoft COCO: Common objects in context. CoRR \narXiv:1405.0312\n\n."},{"key":"1241_CR46","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2015.7298965"},{"issue":"2","key":"1241_CR47","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","volume":"60","author":"DG Lowe","year":"2004","unstructured":"Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2), 91\u2013110.","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"1241_CR48","doi-asserted-by":"crossref","unstructured":"Luo, B., Li, H., Song, T., & Huang, C. (2015). Object segmentation from long video sequences. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1187\u20131190).","DOI":"10.1145\/2733373.2806313"},{"key":"1241_CR49","doi-asserted-by":"crossref","unstructured":"Maninis, K. K., Caelles, S., Pont-Tuset, J., & Van\u00a0Gool, L. (2018). Deep extreme cut: From extreme points to object segmentation. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2018.00071"},{"key":"1241_CR50","doi-asserted-by":"crossref","unstructured":"Mostajabi, M., Yadollahpour, P., & Shakhnarovich, G. (2015). Feedforward semantic segmentation with zoom-out features. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3376\u20133385).","DOI":"10.1109\/CVPR.2015.7298959"},{"key":"1241_CR51","unstructured":"Multiview video repository. In Centre for vision speech and signal processing, University of Surrey, UK. \nhttp:\/\/cvssp.org\/data\/cvssp3d\/\n\n."},{"key":"1241_CR52","doi-asserted-by":"crossref","unstructured":"Mustafa, A., & Hilton, A. (2017). Semantically coherent co-segmentation and reconstruction of dynamic scenes. In CVPR.","DOI":"10.1109\/CVPR.2017.592"},{"key":"1241_CR53","doi-asserted-by":"crossref","unstructured":"Mustafa, A., Kim, H., Guillemaut, J. Y., & Hilton, A. (2016). Temporally coherent 4d reconstruction of complex dynamic scenes. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2016.504"},{"key":"1241_CR54","doi-asserted-by":"crossref","unstructured":"Mustafa, A., Kim, H., & Hilton, A. (2016). 4d match trees for non-rigid surface alignment. In European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-319-46448-0_13"},{"key":"1241_CR55","first-page":"1118","volume":"28","author":"A Mustafa","year":"2019","unstructured":"Mustafa, A., Kim, H., & Hilton, A. (2019). Msfd: Multi-scale segmentation-based feature detection for wide-baseline scene reconstruction. IEEE TIP, 28, 1118\u20131132.","journal-title":"IEEE TIP"},{"key":"1241_CR56","doi-asserted-by":"crossref","unstructured":"Mustafa, A., Volino, M., Guillemaut, J. Y., & Hilton, A. (2017). 4d temporally coherent light-field video. In 3DV.","DOI":"10.1109\/3DV.2017.00014"},{"issue":"4","key":"1241_CR57","doi-asserted-by":"publisher","first-page":"108:1","DOI":"10.1145\/2897824.2925967","volume":"35","author":"F Prada","year":"2016","unstructured":"Prada, F., Kazhdan, M., Chuang, M., Collet, A., & Hoppe, H. (2016). Motion graphs for unstructured textured meshes. ACM Transaction in Graphics, 35(4), 108:1\u2013108:14.","journal-title":"ACM Transaction in Graphics"},{"key":"1241_CR58","unstructured":"Ranjan, A., Jampani, V., Kim, K., Sun, D., Wulff, J., & Black, M. J. (2018). Adversarial collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In IEEE conference on computer vision and pattern recognition (CVPR)."},{"key":"1241_CR59","unstructured":"Revaud, J., Weinzaepfel, P., Harchaoui, Z., & Schmid, C. (2015). Epicflow: Edge-preserving interpolation of correspondences for optical flow. CoRR \narXiv:1501.02565\n\n."},{"key":"1241_CR60","doi-asserted-by":"crossref","unstructured":"Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H. P., & Theobalt, C. (2016). General automatic human shape and motion capture using volumetric contour cues. In European conference on computer vision (ECCV) (pp. 509\u2013526).","DOI":"10.1007\/978-3-319-46454-1_31"},{"key":"1241_CR61","doi-asserted-by":"crossref","unstructured":"Rother, C., Minka, T., Blake, A., & Kolmogorov, V. (2006). Cosegmentation of image pairs by histogram matching\u2014Incorporating a global constraint into mrfs. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 993\u20131000).","DOI":"10.1109\/CVPR.2006.91"},{"key":"1241_CR62","doi-asserted-by":"crossref","unstructured":"Roussos, A., Russell, C., Garg, R., & Agapito, L. (2012). Dense multibody motion estimation and reconstruction from a handheld camera. In The IEEE international symposium on mixed and augmented reality (ISMAR).","DOI":"10.1109\/ISMAR.2012.6402535"},{"key":"1241_CR63","doi-asserted-by":"crossref","unstructured":"Rusu, R. B. (2009). Semantic 3d object maps for everyday manipulation in human living environments. Ph.D. thesis, Computer Science Department, Technische Universitaet Muenchen, Germany","DOI":"10.1007\/s13218-010-0059-6"},{"key":"1241_CR64","doi-asserted-by":"crossref","unstructured":"Sch\u00f6nberger, J. L., Zheng, E., Pollefeys, M., & Frahm, J. M. (2016). Pixelwise view selection for unstructured multi-view stereo. In European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-319-46487-9_31"},{"key":"1241_CR65","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1007\/978-3-319-10599-4_46","volume-title":"Computer Vision \u2013 ECCV 2014","author":"Ben Semerjian","year":"2014","unstructured":"Semerjian, B. (2014). A new variational framework for multiview surface reconstruction. In European conference on computer vision (ECCV) (pp. 719\u2013734)."},{"key":"1241_CR66","doi-asserted-by":"crossref","unstructured":"Sevilla-Lara, L., Sun, D., Jampani, V., & Black, M. J. (2016a). Optical flow with semantic segmentation and localized layers. CoRR \narXiv:1603.03911\n\n.","DOI":"10.1109\/CVPR.2016.422"},{"key":"1241_CR67","doi-asserted-by":"crossref","unstructured":"Sevilla-Lara, L., Sun, D., Jampani, V., & Black, M. J. (2016b). Optical flow with semantic segmentation and localized layers. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3889\u20133898).","DOI":"10.1109\/CVPR.2016.422"},{"issue":"1\u20132","key":"1241_CR68","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1007\/s11263-009-0273-6","volume":"87","author":"L Sigal","year":"2010","unstructured":"Sigal, L., Balan, A., & Black, M. J. (2010). Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision (IJCV), 87(1\u20132), 4\u201327.","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"1241_CR69","unstructured":"Tao, M. W., Bai, J., Kohli, P., & Paris, S. (2012). Simpleflow: A non-iterative, sublinear optical flow algorithm. Computer Graphics Forum (Eurographics 2012), 31(2):345\u2013353."},{"issue":"3","key":"1241_CR70","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1007\/s11263-018-1122-2","volume":"127","author":"P Tokmakov","year":"2019","unstructured":"Tokmakov, P., Schmid, C., & Alahari, K. (2019). Learning to segment moving objects. International Journal of Computer Vision (IJCV), 127(3), 282\u2013301.","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"1241_CR71","doi-asserted-by":"crossref","unstructured":"Tsai, Y. H., Yang, M. H., & Black, M. J. (2016). Video segmentation via object flow. In IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2016.423"},{"key":"1241_CR72","doi-asserted-by":"crossref","unstructured":"Tsai, Y. H., Zhong, G., & Yang, M. H. (2016). Semantic co-segmentation in videos. In European conference on computer vision (ECCV) (pp. 760\u2013775).","DOI":"10.1007\/978-3-319-46493-0_46"},{"key":"1241_CR73","doi-asserted-by":"crossref","unstructured":"Tulsiani, S., Efros, A. A., & Malik, J. (2018). Multi-view consistency as supervisory signal for learning shape and pose prediction. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2018.00306"},{"key":"1241_CR74","doi-asserted-by":"crossref","unstructured":"Vineet, V., Miksik, O., Lidegaard, M., Nie\u00dfner, M., Golodetz, S., Prisacariu, V. A., et al. (2015). Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In IEEE international conference on robotics and automation (ICRA).","DOI":"10.1109\/ICRA.2015.7138983"},{"issue":"1","key":"1241_CR75","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1007\/s11263-010-0404-0","volume":"95","author":"A Wedel","year":"2011","unstructured":"Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., & Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision (IJCV), 95(1), 29\u201351.","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"1241_CR76","doi-asserted-by":"crossref","unstructured":"Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2013). Deepflow: Large displacement optical flow with deep matching. In The IEEE international conference on computer vision (ICCV) (pp. 1385\u20131392).","DOI":"10.1109\/ICCV.2013.175"},{"key":"1241_CR77","doi-asserted-by":"crossref","unstructured":"Xie, J., Kiefel, M., Sun, M. T., & Geiger, A. (2016). Semantic instance annotation of street scenes by 3d to 2d label transfer. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2016.401"},{"key":"1241_CR78","doi-asserted-by":"crossref","unstructured":"Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In The European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-030-01234-2_39"},{"key":"1241_CR79","doi-asserted-by":"crossref","unstructured":"Yin, Z., & Shi, J. (2018). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In CVPR.","DOI":"10.1109\/CVPR.2018.00212"},{"key":"1241_CR80","doi-asserted-by":"crossref","unstructured":"Zanfir, A., Marinoiu, E., & Sminchisescu, C. (2018). Monocular 3d pose and shape estimation of multiple people in natural scenes\u2014The importance of multiple scene constraints. In The IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2018.00229"},{"key":"1241_CR81","doi-asserted-by":"crossref","unstructured":"Zanfir, A., & Sminchisescu, C. (2015). Large displacement 3d scene flow with occlusion reasoning. In The IEEE international conference on computer vision (ICCV).","DOI":"10.1109\/ICCV.2015.502"},{"key":"1241_CR82","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Zhang, X., Peng, C., Xue, X., & Sun, J. (2018). Exfuse: Enhancing feature fusion for semantic segmentation. In The European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-030-01249-6_17"},{"key":"1241_CR83","doi-asserted-by":"crossref","unstructured":"Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). Conditional random fields as recurrent neural networks. In The IEEE international conference on computer vision (ICCV).","DOI":"10.1109\/ICCV.2015.179"},{"key":"1241_CR84","doi-asserted-by":"crossref","unstructured":"Zhu, X., Xiong, Y., Dai, J., Yuan, L., & Wei, Y. (2017). Deep feature flow for video recognition. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4141\u20134150).","DOI":"10.1109\/CVPR.2017.441"},{"issue":"3","key":"1241_CR85","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1145\/1015706.1015766","volume":"23","author":"CL Zitnick","year":"2004","unstructured":"Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2004). High-quality video view interpolation using a layered representation. ACM Transaction on Graphics, 23(3), 600\u2013608.","journal-title":"ACM Transaction on Graphics"},{"key":"1241_CR86","doi-asserted-by":"crossref","unstructured":"Zou, Y., Luo, Z., & Huang, J. B. (2018). Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In European conference on computer vision.","DOI":"10.1007\/978-3-030-01228-1_3"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-019-01241-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11263-019-01241-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-019-01241-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T23:28:28Z","timestamp":1601594908000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11263-019-01241-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,3]]},"references-count":86,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,2]]}},"alternative-id":["1241"],"URL":"https:\/\/doi.org\/10.1007\/s11263-019-01241-w","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,3]]},"assertion":[{"value":"25 January 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 September 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 October 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}