{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,26]],"date-time":"2026-01-26T11:11:05Z","timestamp":1769425865969,"version":"3.49.0"},"reference-count":32,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,3,10]],"date-time":"2022-03-10T00:00:00Z","timestamp":1646870400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the European Commission","award":["H2020-ICT-20-2017-1-RIA-780612"],"award-info":[{"award-number":["H2020-ICT-20-2017-1-RIA-780612"]}]},{"name":"National Funds through the Portuguese funding agency, FCT - Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","award":["LA\/P\/0063\/2020"],"award-info":[{"award-number":["LA\/P\/0063\/2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content- and context-aware video.<\/jats:p>","DOI":"10.3390\/jimaging8030068","type":"journal-article","created":{"date-parts":[[2022,3,10]],"date-time":"2022-03-10T11:46:47Z","timestamp":1646912807000},"page":"68","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8447-2360","authenticated-orcid":false,"given":"Paula","family":"Viana","sequence":"first","affiliation":[{"name":"INESC TEC, 4200-465 Porto, Portugal"},{"name":"School of Engineering, Polytechnic of Porto, 4200-072 Porto, Portugal"}]},{"given":"Maria Teresa","family":"Andrade","sequence":"additional","affiliation":[{"name":"INESC TEC, 4200-465 Porto, Portugal"},{"name":"Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4983-4316","authenticated-orcid":false,"given":"Pedro","family":"Carvalho","sequence":"additional","affiliation":[{"name":"INESC TEC, 4200-465 Porto, Portugal"},{"name":"School of Engineering, Polytechnic of Porto, 4200-072 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3640-7019","authenticated-orcid":false,"given":"Luis","family":"Vila\u00e7a","sequence":"additional","affiliation":[{"name":"INESC TEC, 4200-465 Porto, Portugal"}]},{"given":"In\u00eas N.","family":"Teixeira","sequence":"additional","affiliation":[{"name":"INESC TEC, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3778-8773","authenticated-orcid":false,"given":"Tiago","family":"Costa","sequence":"additional","affiliation":[{"name":"INESC TEC, 4200-465 Porto, Portugal"}]},{"given":"Pieter","family":"Jonker","sequence":"additional","affiliation":[{"name":"QdepQ Systems, 2611NP Delft, The Netherlands"},{"name":"TU Delft Robotics Institute, 2600AA Delft, The Netherlands"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,10]]},"reference":[{"key":"ref_1","unstructured":"Vimeo, Inc. (2021, December 27). Online Video Editor: Smart Video Maker by Magisto. Available online: https:\/\/www.magisto.com\/."},{"key":"ref_2","unstructured":"Animoto Inc. (2021, December 27). Free Video Maker: Create Your Own Video Easily. Available online: https:\/\/www.animoto.com\/."},{"key":"ref_3","unstructured":"Flixel Services (2021, December 27). Create Imagery That Gets Noticed. Available online: https:\/\/www.flixel.com\/."},{"key":"ref_4","unstructured":"Yao, L., Peng, N., Weischedel, R., Knight, K., Zhao, D., and Yan, R. (February, January 27). Plan-and-write: Towards better automatic storytelling. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_5","unstructured":"Rehm, G., Zaczynska, K., and Moreno, J. (2019, January 14). Semantic Storytelling: Towards Identifying Storylines in Large Amounts of Text Content. Proceedings of the 41st European Conference on Information Retrieval, Text2Story@ ECIR, Cologne, Germany."},{"key":"ref_6","unstructured":"Bourgonje, P., Schneider, J.M., Rehm, G., and Sasaki, F. (2016, January 6). Processing document collections to automatically extract linked data: Semantic storytelling technologies for smart curation workflows. Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), Edinburgh, UK."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wang, J., Fu, J., Tang, J., Li, Z., and Mei, T. (2018, January 2\u20137). Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12318"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2298","DOI":"10.1109\/TVCG.2019.2948611","article-title":"Content-Based Visual Summarization for Image Collections","volume":"27","author":"Pan","year":"2021","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Singh, A., Virmani, L., and Subramanyam, A. (2019, January 11\u201313). Image Corpus Representative Summarization. Proceedings of the 2019 IEEE 5th International Conference on Multimedia Big Data (BigMM), Singapore.","DOI":"10.1109\/BigMM.2019.00-46"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, Y., Geng, M., Liu, F., and Zhang, D. (2019, January 24\u201327). Visualization of photo album: Selecting a representative photo of a specific event. Proceedings of the International Conference on Database Systems for Advanced Applications, Jeju, Korea.","DOI":"10.1007\/978-3-030-18590-9_9"},{"key":"ref_11","unstructured":"Google (2021, December 27). Vision AI|Derive Image Insights via ML|Cloud Vision API. Available online: https:\/\/cloud.google.com\/vision."},{"key":"ref_12","unstructured":"Microsoft (2021, December 27). Microsoft Azure: Computer Vision. Available online: https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/computer-vision\/."},{"key":"ref_13","unstructured":"Clarifai Inc. (2021, December 27). Computer Vision and AI Enterprise Platform: All-in-One Tool. Available online: https:\/\/www.clarifai.com\/."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2352","DOI":"10.1162\/neco_a_00990","article-title":"Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review","volume":"29","author":"Rawat","year":"2017","journal-title":"Neural Comput."},{"key":"ref_15","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 12\u201315). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA."},{"key":"ref_16","unstructured":"Nwankpa, C., Ijomah, W., Gachagan, A., and Marshall, S. (2018). Activation Functions: Comparison of trends in Practice and Research for Deep Learning. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going Deeper With Convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conferenec on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_19","unstructured":"Neha Sharma, V.J., and Mishra, A. (2018, January 7\u20138). An Analysis Of Convolutional Neural Networks For Image. Proceedings of the International Conference on Computational Intelligence and Data Science, Gurugram, India."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Choros, K., Kopel, M., Kukla, E., and Sieminski, A. (2019). YouTube Timed Metadata Enrichment Using a Collaborative Approach. Multimedia and Network Information Systems, Springer.","DOI":"10.1007\/978-3-319-98678-4"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pinto, J.P., and Viana, P. (2015, January 24\u201325). Using the Crowd to Boost Video Annotation Processes: A Game Based Approach. Proceedings of the 12th European Conference on Visual Media Production, London, UK.","DOI":"10.1145\/2824840.2824853"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13673-017-0094-5","article-title":"A collaborative approach for semantic time-based video annotation using gamification","volume":"7","author":"Viana","year":"2017","journal-title":"Hum.-Cent. Comput. Inf. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Costa, T.S., Andrade, M.T., and Viana, P. (2021). Inferring Contextual Data from Real-World Photography. Advances in Intelligent Systems and Computing\u2014Intelligent Systems Design and Applications, Springer.","DOI":"10.1007\/978-3-030-71187-0_78"},{"key":"ref_24","unstructured":"Intel (2021, March 16). Intel Context Sensing SDK. Available online: https:\/\/www.youtube.com\/watch?v=DX9wP7ZhAOY."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., and Guadarrama, S. (2017, January 21\u201326). Speed\/accuracy trade-offs for modern convolutional object detectors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.351"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2016). Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_27","unstructured":"Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., and Duerig, T. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhou, T., Li, J., Li, X., and Shao, L. (2021, January 20\u201325). Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00691"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"8326","DOI":"10.1109\/TIP.2020.3013162","article-title":"MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation","volume":"29","author":"Zhou","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"340","DOI":"10.1002\/col.1049","article-title":"The development of the CIE 2000 colour-difference formula: CIEDE2000","volume":"26","author":"Luo","year":"2001","journal-title":"Color Res. Appl."},{"key":"ref_32","first-page":"2141","article-title":"Efficient CIEDE2000-Based Color Similarity Decision for Computer Vision","volume":"30","author":"Pereira","year":"2020","journal-title":"IEEE Trans. Circuits Syst. Video Technol."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/8\/3\/68\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:34:12Z","timestamp":1760135652000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/8\/3\/68"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,10]]},"references-count":32,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["jimaging8030068"],"URL":"https:\/\/doi.org\/10.3390\/jimaging8030068","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,10]]}}}