{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T01:49:33Z","timestamp":1776304173989,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,3,8]],"date-time":"2022-03-08T00:00:00Z","timestamp":1646697600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The rapid development of machine learning technologies in recent years has led to the emergence of CNN-based sensors or ML-enabled smart sensor systems, which are intensively used in medical analytics, unmanned driving of cars, Earth sensing, etc. In practice, the accuracy of CNN-based sensors is highly dependent on the quality of the training datasets. The preparation of such datasets faces two fundamental challenges: data quantity and data quality. In this paper, we propose an approach aimed to solve both of these problems and investigate its efficiency. Our solution improves training datasets and validates it in several different applications: object classification and detection, depth buffer reconstruction, panoptic segmentation. We present a pipeline for image dataset augmentation by synthesis with computer graphics and generative neural networks approaches. Our solution is well-controlled and allows us to generate datasets in a reproducible manner with the desired distribution of features which is essential to conduct specific experiments in computer vision. We developed a content creation pipeline targeted to create realistic image sequences with highly variable content. Our technique allows rendering of a single 3D object or 3D scene in a variety of ways, including changing of geometry, materials and lighting. By using synthetic data in training, we have improved the accuracy of CNN-based sensors compared to using only real-life data.<\/jats:p>","DOI":"10.3390\/s22062080","type":"journal-article","created":{"date-parts":[[2022,3,9]],"date-time":"2022-03-09T01:50:53Z","timestamp":1646790653000},"page":"2080","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Image Synthesis Pipeline for CNN-Based Sensing Systems"],"prefix":"10.3390","volume":"22","author":[{"given":"Vladimir","family":"Frolov","sequence":"first","affiliation":[{"name":"Keldysh Institute of Applied Math RAS, 125047 Moscow, Russia"},{"name":"Faculty of Computational Mathematics and Cybernetics, Moscow State University, 119991 Moscow, Russia"}]},{"given":"Boris","family":"Faizov","sequence":"additional","affiliation":[{"name":"Faculty of Computational Mathematics and Cybernetics, Moscow State University, 119991 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1586-9257","authenticated-orcid":false,"given":"Vlad","family":"Shakhuro","sequence":"additional","affiliation":[{"name":"Faculty of Computational Mathematics and Cybernetics, Moscow State University, 119991 Moscow, Russia"},{"name":"Samsung AI Research Center, 125196 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6455-6444","authenticated-orcid":false,"given":"Vadim","family":"Sanzharov","sequence":"additional","affiliation":[{"name":"Keldysh Institute of Applied Math RAS, 125047 Moscow, Russia"},{"name":"Faculty of Computational Mathematics and Cybernetics, Moscow State University, 119991 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6152-0021","authenticated-orcid":false,"given":"Anton","family":"Konushin","sequence":"additional","affiliation":[{"name":"Faculty of Computational Mathematics and Cybernetics, Moscow State University, 119991 Moscow, Russia"},{"name":"Samsung AI Research Center, 125196 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6460-7539","authenticated-orcid":false,"given":"Vladimir","family":"Galaktionov","sequence":"additional","affiliation":[{"name":"Keldysh Institute of Applied Math RAS, 125047 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1252-8294","authenticated-orcid":false,"given":"Alexey","family":"Voloboy","sequence":"additional","affiliation":[{"name":"Keldysh Institute of Applied Math RAS, 125047 Moscow, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2000063","DOI":"10.1002\/aisy.202000063","article-title":"Machine Learning-Enabled Smart Sensor Systems","volume":"2","author":"Ha","year":"2020","journal-title":"Adv. Intell. Syst."},{"key":"ref_2","unstructured":"Movshovitz-Attias, Y., Kanade, T., and Sheikh, Y. (2016). How useful is photo-realistic rendering for visual learning?. European Conference on Computer Vision, Proceedings of the ECCV 2016: Computer Vision\u2014ECCV 2016 Workshops, Amsterdam, The Netherlands, 8\u201310 and 15\u201316 October 2016, Springer."},{"key":"ref_3","unstructured":"Tsirikoglou, A., Kronander, J., Wrenninge, M., and Unger, J. (2017). Procedural Modeling and Physically Based Rendering for Synthetic Data Generation in Automotive Applications. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hodan, T., Vineet, V., Gal, R., Shalev, E., Hanzelka, J., Connell, T., Urbina, P., Sinha, S.N., and Guenter, B. (2019). Photorealistic image synthesis for object instance detection. arXiv.","DOI":"10.1109\/ICIP.2019.8803821"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Shah, S., Dey, D., Lovett, C., and Kapoor, A. (2017). AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. arXiv.","DOI":"10.1007\/978-3-319-67361-5_40"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., and Birchfield, S. (2018). Training deep networks with synthetic data: Bridging the reality gap by domain randomization. arXiv.","DOI":"10.1109\/CVPRW.2018.00143"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J.Y., Jin, H., and Funkhouser, T. (2017, January 21\u201326). Physically-based rendering for indoor scene understanding using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.537"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., and Funkhouser, T. (2016). Semantic Scene Completion from a Single Depth Image. arXiv.","DOI":"10.1109\/CVPR.2017.28"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Li, Z., and Snavely, N. (2018, January 8\u201314). Cgintrinsics: Better intrinsic image decomposition through physically-based rendering. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01219-9_23"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kirsanov, P., Gaskarov, A., Konokhov, F., Sofiiuk, K., Vorontsova, A., Slinko, I., Zhukov, D., Bykov, S., Barinova, O., and Konushin, A. (2019). DISCOMAN: Dataset of Indoor Scenes for Odometry, Mapping and Navigation. arXiv.","DOI":"10.1109\/IROS40897.2019.8967921"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"McCormac, J., Handa, A., Leutenegger, S., and Davison, A.J. (2017, January 22\u201329). SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.292"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1007\/s11263-018-1070-x","article-title":"Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes","volume":"126","author":"Alhaija","year":"2018","journal-title":"Int. J. Comput. Vis."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1109\/38.988744","article-title":"Image-Based Lighting","volume":"22","author":"Debevec","year":"2002","journal-title":"IEEE Comput. Graph. Appl."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Valiev, I., Voloboy, A., and Galaktionov, V. (2008, January 21\u201323). Improved model of IBL sunlight simulation. Proceedings of the 24th Spring Conference on Computer Graphics, Budmerice, Slovakia.","DOI":"10.1145\/1921264.1921274"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Song, S., and Funkhouser, T. (2019, January 15\u201320). Neural Illumination: Lighting Prediction for Indoor Environments. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00708"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Garon, M., Sunkavalli, K., Hadap, S., Carr, N., and Lalonde, J.F. (2019, January 15\u201320). Fast Spatially-Varying Indoor Lighting Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00707"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1134\/S0361768820030093","article-title":"Restoration of Lighting Parameters in Mixed Reality Systems Using Convolutional Neural Network Technology Based on RGBD Images","volume":"46","author":"Sorokin","year":"2020","journal-title":"Program. Comput. Softw."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3450626.3459872","article-title":"Total Relighting: Learning to Relight Portraits for Background Replacement","volume":"40","author":"Pandey","year":"2021","journal-title":"ACM Trans. Graph. (Proc. SIGGRAPH)"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Risi, S., and Togelius, J. (2019). Increasing Generality in Machine Learning through Procedural Content Generation. arXiv.","DOI":"10.1038\/s42256-020-0208-z"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Spick, R.J., Cowling, P., and Walker, J.A. (2019, January 20\u201323). Procedural Generation using Spatial GANs for Region-Specific Learning of Elevation Data. Proceedings of the IEEE Conference on Games (CoG), London, UK.","DOI":"10.1109\/CIG.2019.8848120"},{"key":"ref_22","unstructured":"Borkman, S., Crespi, A., Dhakad, S., Ganguly, S., Hogins, J., Jhang, Y.-C., Kamalzadeh, M., Li, B., Leal, S., and Parisi, P. (2021). Unity Perception: Generate Synthetic Data for Computer Vision. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Prakash, A., Debnath, S., Lafleche, J.-F., Cameracci, E., State, G., Birchfield, S., and Law, M.T. (2021, January 11\u201317). Self-Supervised Real-to-Sim Scene Generation. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01574"},{"key":"ref_24","unstructured":"Denninger, M., Sundermeyer, M., Winkelbauer, D., Zidan, Y., Olefir, D., Elbadrawy, M., Lodhi, A., and Katam, H. (2019). BlenderProc. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., Liu, M., Jiang, H., Yuan, Y., and Wang, H. (2020, January 13\u201319). SAPIEN: A simulated part-based interactive environment. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01111"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Eftekhar, A., Sax, A., Malik, J., and Zamir, A. (2021, January 11\u201317). Omnidata: A Scalable. Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01061"},{"key":"ref_27","unstructured":"Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial networks. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tripathi, S., Chandra, S., Agrawal, A., Tyagi, A., Rehg, J.M., and Chari, V. (2019, January 16\u201317). Learning to generate synthetic data via compositing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00055"},{"key":"ref_29","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015). Spatial transformer networks. arXiv."},{"key":"ref_30","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., and Doll\u00e1r, P. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, B.-C., and Kae, A. (2019, January 16\u201317). Toward realistic image compositing with adversarial learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00861"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhu, J.-Y., Park, T., Isola, P., and Efros, A. (2017, January 22\u201329). A Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Devaranjan, J., Kar, A., and Fidler, S. (2020). Meta-Sim2: Unsupervised learning of scene structure for synthetic data generation. European Conference on Computer Vision, ECCV 2020: Computer Vision\u2014ECCV 2020, 16th European Conference, Glasgow, UK, 23\u201328 August 2020, Springer.","DOI":"10.1007\/978-3-030-58520-4_42"},{"key":"ref_35","unstructured":"Frolov, V., Sanzharov, V., Trofimov, M., Pavlov, D., and Galaktionov, V. (2020, September 01). Hydra Renderer. Open Source GPU Based Rendering System. Github. Available online: https:\/\/github.com\/Ray-Tracing-Systems\/HydraAPI."},{"key":"ref_36","unstructured":"Sanzharov, V., Frolov, V., Voloboy, A., and Galaktionov, V. (2020, January 23\u201325). Sample scenes from a distribution: A content creation pipeline for realistic rendering for neural networks training. Proceedings of the International Conference on Computer Graphics, Visualization, Computer Vision and Image (CGVCVIP 2020), Zagreb, Croatia."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"294","DOI":"10.18287\/2412-6179-2016-40-2-294-300","article-title":"Russian traffic sign images dataset","volume":"40","author":"Shakhuro","year":"2016","journal-title":"Comput. Opt."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. European Conference on Computer Vision, Proceedings of the ECCV 2012: Computer Vision\u2014ECCV 2012, 12th European Conference on Computer Vision, Florence, Italy, 7\u201313 October 2012, Springer.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., and Schmid, C. (2017, January 21\u201326). Learning from synthetic humans. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.492"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., and Van Gool, L. (2019, January 20\u201324). Night-to-day image translation for retrieval-based localization. Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794387"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2016). Wide residual networks. arXiv.","DOI":"10.5244\/C.30.87"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Shakhuro, V., Faizov, B., and Konushin, A. (2019, January 20\u201323). Rare Traffic Sign Recognition using Synthetic Training Data. Proceedings of the 3rd International Conference on Video and Image Processing (ICVIP 2019), Shanghai, China.","DOI":"10.1145\/3376067.3376105"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Konushin, A., Faizov, B., and Shakhuro, V. (2021). Road images augmentation with synthetic traffic signs using neural networks. arXiv.","DOI":"10.18287\/2412-6179-CO-859"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Karras, T., Laine, S., and Aila, T. (2019, January 16\u201317). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00453"},{"key":"ref_45","unstructured":"Kim, K.-H., Hong, S., Roh, B., Cheon, Y., and Park, M. (2016). Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv."},{"key":"ref_46","unstructured":"Alhashim, I., and Wonka, P. (2018). High quality monocular depth estimation via transfer learning. arXiv."},{"key":"ref_47","unstructured":"Eigen, D., Puhrsch, C., and Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Kirillov, A., Girshick, R., He, K., and Dollar, P. (2019, January 16\u201317). Panoptic feature pyramid networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00656"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2080\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:33:51Z","timestamp":1760135631000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2080"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,8]]},"references-count":48,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22062080"],"URL":"https:\/\/doi.org\/10.3390\/s22062080","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,8]]}}}