{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,23]],"date-time":"2025-11-23T13:31:32Z","timestamp":1763904692538,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2021,8,5]],"date-time":"2021-08-05T00:00:00Z","timestamp":1628121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["FA8750-20-2-1004"],"award-info":[{"award-number":["FA8750-20-2-1004"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Air Force Research Laboratory","award":["FA8750-20-2-1004"],"award-info":[{"award-number":["FA8750-20-2-1004"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Identifying the source camera of images and videos has gained significant importance in multimedia forensics. It allows tracing back data to their creator, thus enabling to solve copyright infringement cases and expose the authors of hideous crimes. In this paper, we focus on the problem of camera model identification for video sequences, that is, given a video under analysis, detecting the camera model used for its acquisition. To this purpose, we develop two different CNN-based camera model identification methods, working in a novel multi-modal scenario. Differently from mono-modal methods, which use only the visual or audio information from the investigated video to tackle the identification task, the proposed multi-modal methods jointly exploit audio and visual information. We test our proposed methodologies on the well-known Vision dataset, which collects almost 2000 video sequences belonging to different devices. Experiments are performed, considering native videos directly acquired by their acquisition devices and videos uploaded on social media platforms, such as YouTube and WhatsApp. The achieved results show that the proposed multi-modal approaches significantly outperform their mono-modal counterparts, representing a valuable strategy for the tackled problem and opening future research to even more challenging scenarios.<\/jats:p>","DOI":"10.3390\/jimaging7080135","type":"journal-article","created":{"date-parts":[[2021,8,5]],"date-time":"2021-08-05T09:35:32Z","timestamp":1628156132000},"page":"135","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["CNN-Based Multi-Modal Camera Model Identification on Video Sequences"],"prefix":"10.3390","volume":"7","author":[{"given":"Davide","family":"Dal Cortivo","sequence":"first","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy"}]},{"given":"Sara","family":"Mandelli","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy"}]},{"given":"Paolo","family":"Bestagini","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy"}]},{"given":"Stefano","family":"Tubaro","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hosler, B.C., Mayer, O., Bayar, B., Zhao, X., Chen, C., Shackleford, J.A., and Stamm, M.C. (2019, January 12\u201317). A Video Camera Model Identification System Using Deep Learning and Fusion. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682608"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kirchner, M., and Gloe, T. (2015). Forensic camera model identification. Handbook of Digital Forensics of Multimedia Data and Devices, Wiley-IEEE Press.","DOI":"10.1002\/9781118705773.ch9"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Takamatsu, J., Matsushita, Y., Ogasawara, T., and Ikeuchi, K. (2010, January 13\u201318). Estimating demosaicing algorithms using image noise variance. Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540200"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kirchner, M. (2010, January 18\u201320). Efficient estimation of CFA pattern configuration in digital camera images. Proceedings of the Media Forensics and Security II, IS&T-SPIE Electronic Imaging Symposium, San Jose, CA, USA.","DOI":"10.1117\/12.839102"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3948","DOI":"10.1109\/TSP.2005.855406","article-title":"Exposing digital forgeries in color filter array interpolated images","volume":"53","author":"Popescu","year":"2005","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Bayram, S., Sencar, H.T., Memon, N., and Avcibas, I. (2006, January 11\u201314). Improvements on source camera-model identification based on CFA interpolation. Proceedings of the WG 2006, Kobe, Japan.","DOI":"10.1109\/ICIP.2005.1530330"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1109\/TIFS.2006.890307","article-title":"Nonintrusive Component Forensics of Visual Sensors Using Output Images","volume":"2","author":"Swaminathan","year":"2007","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1109\/TIFS.2009.2033749","article-title":"Accurate detection of demosaicing regularity for digital image forensics","volume":"4","author":"Cao","year":"2009","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_9","unstructured":"Chen, C., and Stamm, M.C. (2015, January 16\u201319). Camera model identification framework using an ensemble of demosaicing features. Proceedings of the 2015 IEEE International Workshop on Information Forensics and Security, WIFS 2015, Roma, Italy."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"11551","DOI":"10.1364\/OE.14.011551","article-title":"Automatic source camera identification using the intrinsic lens radial distortion","volume":"14","author":"Lam","year":"2006","journal-title":"Opt. Express"},{"key":"ref_11","unstructured":"Lanh, T.V., Emmanuel, S., and Kankanhalli, M.S. (2007, January 2\u20135). Identifying Source Cell Phone using Chromatic Aberration. Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007, Beijing, China."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gloe, T., Borowka, K., and Winkler, A. (2010, January 18\u201320). Efficient estimation and large-scale evaluation of lateral chromatic aberration for digital image forensics. Proceedings of the Media Forensics and Security II, IS&T-SPIE Electronic Imaging Symposium, San Jose, CA, USA.","DOI":"10.1117\/12.839034"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Yu, J., Craver, S., and Li, E. (2011, January 24\u201326). Toward the identification of DSLR lenses by chromatic aberration. Proceedings of the Media Forensics and Security III, San Francisco, CA, USA.","DOI":"10.1117\/12.872681"},{"key":"ref_14","unstructured":"Campisi, P., Dittmann, J., and Craver, S. (2010, January 9\u201310). Estimating vignetting function from a single image for image authentication. Proceedings of the Multimedia and Security Workshop, MM&Sec 2010, Roma, Italy."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1109\/TIFS.2008.926987","article-title":"Digital Single Lens Reflex Camera Identification From Traces of Sensor Dust","volume":"3","author":"Dirik","year":"2008","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.dsp.2015.10.002","article-title":"Camera model identification based on the generalized noise model in natural images","volume":"48","author":"Thai","year":"2016","journal-title":"Digit. Signal Process."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tuama, A., Comby, F., and Chaumont, M. (2016, January 4\u20137). Camera model identification with the use of deep convolutional neural networks. Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2016, Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/WIFS.2016.7823908"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1109\/LSP.2016.2641006","article-title":"First Steps Toward Camera Model Identification With Convolutional Neural Networks","volume":"24","author":"Bondi","year":"2017","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1109\/MSP.2018.2847326","article-title":"Forensic Camera Model Identification: Highlights from the IEEE Signal Processing Cup 2018 Student Competition [SP Competitions]","volume":"35","author":"Stamm","year":"2018","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_20","unstructured":"Rafi, A.M., Kamal, U., Hoque, R., Abrar, A., Das, S., Lagani\u00e8re, R., and Hasan, M.K. (2019, January 16\u201320). Application of DenseNet in Camera Model Identification and Post-processing Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mandelli, S., Bonettini, N., Bestagini, P., and Tubaro, S. (2020, January 6\u201311). Training CNNs in Presence of JPEG Compression: Multimedia Forensics vs Computer Vision. Proceedings of the 12th IEEE International Workshop on Information Forensics and Security, WIFS 2020, New York, NY, USA.","DOI":"10.1109\/WIFS49906.2020.9360903"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3655","DOI":"10.1007\/s00521-020-05220-y","article-title":"RemNet: Remnant convolutional neural network for camera model identification","volume":"33","author":"Rafi","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_23","unstructured":"Tan, M., and Le, Q.V. (2019, January 9\u201315). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, ICML 2019, PMLR, Long Beach, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., Moore, C., Plakal, M., Platt, D., Saurous, R.A., and Seybold, B. (2017, January 5\u20139). CNN Architectures for Large-Scale Audio Classification. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"ref_25","unstructured":"Seferbekov, S., and Lee, E. (2021, July 27). DeepFake Detection (DFDC) Solution by @selimsef. Available online: https:\/\/github.com\/selimsef\/dfdc_deepfake_challenge."},{"key":"ref_26","unstructured":"Verdoliva, D.C.G.P.L. (2019, January 16\u201320). Extracting camera-based fingerprints for video forensics. Proceedings of the CVPRW, Long Beach, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/TIFS.2019.2918644","article-title":"Facing device attribution problem for stabilized video sequences","volume":"15","author":"Mandelli","year":"2019","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Mayer, O., Hosler, B., and Stamm, M.C. (2020, January 4\u20138). Open set video camera model verification. Proceedings of the ICASSP 2020\u20132020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054261"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"910","DOI":"10.1109\/JSTSP.2020.3002101","article-title":"Media forensics and deepfakes: An overview","volume":"14","author":"Verdoliva","year":"2020","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hosler, B., Salvi, D., Murray, A., Antonacci, F., Bestagini, P., Tubaro, S., and Stamm, M.C. (2021, January 19\u201325). Do Deepfakes Feel Emotions? A Semantic Approach to Detecting Deepfakes via Emotional Inconsistencies. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Online.","DOI":"10.1109\/CVPRW53098.2021.00112"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., and Manocha, D. (2020, January 12\u201316). Emotions Don\u2019t Lie: An Audio-Visual Deepfake Detection Method Using Affective Cues. Proceedings of the 28th ACM International Conference on Multimedia, MM \u201920, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413570"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Agarwal, S., Farid, H., Fried, O., and Agrawala, M. (2020, January 14\u201319). Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00338"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Agarwal, S., and Farid, H. (2021, January 15\u201319). Detecting Deep-Fake Videos From Aural and Oral Dynamics. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Online.","DOI":"10.1109\/CVPRW53098.2021.00109"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MSP.2005.1407713","article-title":"Color image processing pipeline","volume":"22","author":"Ramanath","year":"2005","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_35","unstructured":"Tabora, V. (2021, April 07). Photo Sensors In Digital Cameras. Available online: https:\/\/medium.com\/hd-pro\/photo-sensors-in-digital-cameras-94fb26203da1."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"329","DOI":"10.2307\/1417526","article-title":"The relation of pitch to frequency: A revised scale","volume":"53","author":"Stevens","year":"1940","journal-title":"Am. J. Psychol."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Shen, J., Pang, R., Weiss, R.J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., and Skerrv-Ryan, R. (2018, January 15\u201320). Natural TTS synthesis by conditioning Wavenet on mel spectrogram predictions. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461368"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"125868","DOI":"10.1109\/ACCESS.2019.2938007","article-title":"Speech emotion recognition from 3D log-mel spectrograms with deep learning network","volume":"7","author":"Meng","year":"2019","journal-title":"IEEE Access"},{"key":"ref_39","unstructured":"Mascia, M., Canclini, A., Antonacci, F., Tagliasacchi, M., Sarti, A., and Tubaro, S. (September, January 31). Forensic and anti-forensic analysis of indoor\/outdoor classifiers based on acoustic clues. Proceedings of the European Signal Processing Conference (EUSIPCO), Nice, France."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liang, B., Fazekas, G., and Sandler, M. (2019, January 12\u201317). Piano Sustain-pedal Detection Using Convolutional Neural Networks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683505"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Comanducci, L., Bestagini, P., Tagliasacchi, M., Sarti, A., and Tubaro, S. (2021). Reconstructing Speech from CNN Embeddings. IEEE Signal Process. Lett.","DOI":"10.1109\/LSP.2021.3073628"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"139438","DOI":"10.1109\/ACCESS.2019.2943492","article-title":"Lung Sound Recognition Algorithm based on VGGish-BiGRU","volume":"7","author":"Shi","year":"2019","journal-title":"IEEE Access"},{"key":"ref_43","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13635-017-0067-2","article-title":"VISION: A video and image dataset for source identification","volume":"2017","author":"Shullani","year":"2017","journal-title":"EURASIP J. Inf. Secur."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Li, F. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., and Ritter, M. (2017, January 5\u20139). Audio Set: An ontology and human-labeled dataset for audio events. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"ref_47","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/7\/8\/135\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:40:54Z","timestamp":1760164854000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/7\/8\/135"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,5]]},"references-count":47,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["jimaging7080135"],"URL":"https:\/\/doi.org\/10.3390\/jimaging7080135","relation":{},"ISSN":["2313-433X"],"issn-type":[{"type":"electronic","value":"2313-433X"}],"subject":[],"published":{"date-parts":[[2021,8,5]]}}}