{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T17:43:21Z","timestamp":1771955001320,"version":"3.50.1"},"reference-count":26,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2018,7,12]],"date-time":"2018-07-12T00:00:00Z","timestamp":1531353600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020","doi-asserted-by":"publisher","award":["688007"],"award-info":[{"award-number":["688007"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The advance of scene understanding methods based on machine learning relies on the availability of large ground truth datasets, which are essential for their training and evaluation. Construction of such datasets with imagery from real sensor data however typically requires much manual annotation of semantic regions in the data, delivered by substantial human labour. To speed up this process, we propose a framework for semantic annotation of scenes captured by moving camera(s), e.g., mounted on a vehicle or robot. It makes use of an available 3D model of the traversed scene to project segmented 3D objects into each camera frame to obtain an initial annotation of the associated 2D image, which is followed by manual refinement by the user. The refined annotation can be transferred to the next consecutive frame using optical flow estimation. We have evaluated the efficiency of the proposed framework during the production of a labelled outdoor dataset. The analysis of annotation times shows that up to 43% less effort is required on average, and the consistency of the labelling is also improved.<\/jats:p>","DOI":"10.3390\/s18072249","type":"journal-article","created":{"date-parts":[[2018,7,12]],"date-time":"2018-07-12T11:19:24Z","timestamp":1531394364000},"page":"2249","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Consistent Semantic Annotation of Outdoor Datasets via 2D\/3D Label Transfer"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6020-1141","authenticated-orcid":false,"given":"Radim","family":"Tylecek","sequence":"first","affiliation":[{"name":"School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6860-9371","authenticated-orcid":false,"given":"Robert B.","family":"Fisher","sequence":"additional","affiliation":[{"name":"School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK"}]}],"member":"1968","published-online":{"date-parts":[[2018,7,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fellbaum, C. (1998). WordNet: An Electronic Lexical Database, MIT Press.","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"ref_2","unstructured":"Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., and Su, H. (arXiv, 2015). ShapeNet: An Information-Rich 3D Model Repository, arXiv."},{"key":"ref_3","unstructured":"Boom, B.J., Huang, P.X., He, J., and Fisher, R.B. (2012, January 11\u201315). Supporting ground-truth annotation of image datasets using clustering. Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Papadopoulos, D.P., Uijlings, J.R.R., Keller, F., and Ferrari, V. (2017, January 21\u201326). Extreme clicking for efficient object annotation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.528"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1145\/1015706.1015720","article-title":"\u201cGrabCut\u201d: Interactive Foreground Extraction Using Iterated Graph Cuts","volume":"23","author":"Rother","year":"2004","journal-title":"ACM Trans. Graph."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nguyen, D.T., Hua, B.S., Yu, L.F., and Yeung, S.K. (arXiv, 2017). A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation, arXiv.","DOI":"10.1109\/TVCG.2017.2772238"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Papadopoulos, D.P., Uijlings, J.R.R., Keller, F., and Ferrari, V. (July, January 26). We Do Not Need No Bounding-Boxes: Training Object Class Detectors Using Only Human Verification. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas Valley, NV, USA.","DOI":"10.1109\/CVPR.2016.99"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_9","unstructured":"Deng, J., Russakovsky, O., Krause, J., Bernstein, M.S., Berg, A., and Fei-Fei, L. (May, January 26). Scalable Multi-label Annotation. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI \u201914), Toronto, ON, Canada."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1007\/s00530-015-0491-4","article-title":"A diversity-based search approach to support annotation of a large fish image dataset","volume":"22","author":"Giordano","year":"2016","journal-title":"Multimed. Syst."},{"key":"ref_11","unstructured":"Salvo, R.D., Spampinato, C., and Giordano, D. (2016, January 7\u20139). Generating reliable video annotations by exploiting the crowd. Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA."},{"key":"ref_12","unstructured":"Yu, F., Zhang, Y., Song, S., Seff, A., and Xiao, J. (arXiv, 2015). LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop, arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/s11263-007-0090-8","article-title":"LabelMe: A Database and Web-Based Tool for Image Annotation","volume":"77","author":"Russell","year":"2008","journal-title":"Int. J. Comput. Vis."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"154:1","DOI":"10.1145\/2751556","article-title":"SemanticPaint: Interactive 3D Labeling and Learning at Your Fingertips","volume":"34","author":"Valentin","year":"2015","journal-title":"ACM Trans. Graph."},{"key":"ref_15","unstructured":"Sattler, T., Brox, T., Pollefeys, M., Fisher, R.B., and Tylecek, R. (2017, January 22\u201329). 3D Reconstruction meets Semantics\u2014Reconstruction Challenge. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hackel, T., Savinov, N., Ladicky, L., Wegner, J.D., Schindler, K., and Pollefeys, M. (arXiv, 2017). SEMANTIC3D.NET: A new large-scale point cloud classification benchmark, arXiv.","DOI":"10.5194\/isprs-annals-IV-1-W1-91-2017"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Xie, J., Kiefel, M., Sun, M.T., and Geiger, A. (July, January 26). Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas Valley, NV, USA.","DOI":"10.1109\/CVPR.2016.401"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sch\u00f6ps, T., Sch\u00f6nberger, J.L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., and Geiger, A. (2017, January 21\u201326). A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.272"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2368","DOI":"10.1109\/TPAMI.2011.131","article-title":"Nonparametric Scene Parsing via Label Transfer","volume":"33","author":"Liu","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (2014). Superpixel Graph Label Transfer with Learned Distance Metric. Proceedings of the European Conference on Computer Vision\u2014ECCV 2014, Zurich, Switzerland, 6\u201312 September 2014, Part III, Springer.","DOI":"10.1007\/978-3-319-10578-9"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Furgale, P., Rehder, J., and Siegwart, R. (2013, January 3\u20137). Unified temporal and spatial calibration for multi-sensor systems. Proceedings of the 2013 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan.","DOI":"10.1109\/IROS.2013.6696514"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sch\u00f6nberger, J.L., and Frahm, J.M. (July, January 26). Structure-from-Motion Revisited. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas Valley, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_23","unstructured":"Girardeau-Montaut, D. (2017). CloudCompare\u20143D Point Cloud and Mesh Processing Software, Telecom ParisTechs. Open Source Project."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhang, W., Qi, J., Wan, P., Wang, H., Xie, D., Wang, X., and Yan, G. (2016). An Easy-to-Use Airborne LiDAR Data Filtering Method Based on Cloth Simulation. Remote Sens., 8.","DOI":"10.3390\/rs8060501"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Revaud, J., Weinzaepfel, P., Harchaoui, Z., and Schmid, C. (2015, January 7\u201312). EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298720"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1109\/TPAMI.2012.120","article-title":"SLIC Superpixels Compared to State-of-the-Art Superpixel Methods","volume":"34","author":"Achanta","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/7\/2249\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:11:50Z","timestamp":1760195510000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/7\/2249"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,12]]},"references-count":26,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2018,7]]}},"alternative-id":["s18072249"],"URL":"https:\/\/doi.org\/10.3390\/s18072249","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,7,12]]}}}