{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T05:21:19Z","timestamp":1780636879013,"version":"3.54.1"},"reference-count":74,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T00:00:00Z","timestamp":1633392000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T00:00:00Z","timestamp":1633392000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004505","name":"Universit\u00e0 degli Studi di Catania","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004505","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as <jats:italic>conspicuity maps<\/jats:italic>) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for <jats:italic>domain adaptation<\/jats:italic> and <jats:italic>domain-specific learning<\/jats:italic>. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art. Source code, trained models and example outputs are publicly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/perceivelab\/hd2s\">https:\/\/github.com\/perceivelab\/hd2s<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s11263-021-01519-y","type":"journal-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:19:34Z","timestamp":1633457974000},"page":"3216-3232","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":73,"title":["Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction"],"prefix":"10.1007","volume":"129","author":[{"given":"G.","family":"Bellitto","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"F.","family":"Proietto Salanitri","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2441-0982","authenticated-orcid":false,"given":"S.","family":"Palazzo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"F.","family":"Rundo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"D.","family":"Giordano","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"C.","family":"Spampinato","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"issue":"12","key":"1519_CR1","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE TPAMI, 39(12), 2481\u20132495.","journal-title":"IEEE TPAMI"},{"issue":"7","key":"1519_CR2","first-page":"1688","volume":"20","author":"C Bak","year":"2017","unstructured":"Bak, C., et al. (2017). Spatio-temporal saliency networks for dynamic saliency prediction. IEEE TMM, 20(7), 1688\u20131698.","journal-title":"IEEE TMM"},{"key":"1519_CR3","unstructured":"Bazzani, L., Larochelle, H., Torresani L. (2016). Recurrent mixture density network for spatiotemporal visual attention . In: arXiv preprint arXiv:1603.08199 (2016)."},{"key":"1519_CR4","unstructured":"Borji, A., Itti, L. (2015). Cat2000: A large scale fixation dataset for boosting saliency research . In: arXiv preprint arXiv:1505.03581"},{"issue":"3","key":"1519_CR5","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1109\/TPAMI.2018.2815601","volume":"41","author":"Z Bylinskii","year":"2018","unstructured":"Bylinskii, Z., et al. (2018). What do different evaluation metrics tell us about saliency models? IEEE TPAMI, 41(3), 740\u2013757.","journal-title":"IEEE TPAMI"},{"key":"1519_CR6","doi-asserted-by":"crossref","unstructured":"Chang W.-G. et al. (2019). Domain-specific batch normalization for unsupervised domain adaptation . In: Proceedings of the IEEE\/CVF CVPR. pp. 7354\u20137362.","DOI":"10.1109\/CVPR.2019.00753"},{"key":"1519_CR7","first-page":"2287","volume":"29","author":"Z Che","year":"2019","unstructured":"Che, Z., et al. (2019). How is gaze influenced by image transformations? dataset and model. IEEE TIP, 29, 2287\u20132300.","journal-title":"IEEE TIP"},{"key":"1519_CR8","unstructured":"Yangyu C. et al. (2018). \u201cSaliency-based spatiotemporal attention for video captioning\u201d. In: 2018 IEEE BigMM. IEEE. pp. 1\u20138."},{"key":"1519_CR9","doi-asserted-by":"crossref","unstructured":"Cornia M. et al. (2016). A deep multi-level network for saliency prediction . In: ICPR. IEEE. pp. 3488\u20133493.","DOI":"10.1109\/ICPR.2016.7900174"},{"issue":"10","key":"1519_CR10","first-page":"5142","volume":"27","author":"M Cornia","year":"2018","unstructured":"Cornia, M., et al. (2018). Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE TIP, 27(10), 5142\u20135154.","journal-title":"IEEE TIP"},{"key":"1519_CR11","doi-asserted-by":"crossref","unstructured":"Dosovitskiy, A. et al. (2015). Flownet: Learning optical flow with convolutional networks . In: ICCV. pp. 2758\u20132766.","DOI":"10.1109\/ICCV.2015.316"},{"key":"1519_CR12","first-page":"419","volume-title":"ECCV","author":"R Droste","year":"2020","unstructured":"Droste, R., Jiao, J., & Alison, N. J. (2020). Unified image and video saliency modeling. ECCV (pp. 419\u2013435). Berlin: Springer."},{"key":"1519_CR13","doi-asserted-by":"crossref","unstructured":"Fan, S. et al. (2018). Emotional attention: A study of image sentiment and visual attention . In: Proceedings of the IEEE CVPR. pp. 7521\u20137531.","DOI":"10.1109\/CVPR.2018.00785"},{"issue":"1","key":"1519_CR14","first-page":"2096","volume":"17","author":"Y Ganin","year":"2016","unstructured":"Ganin, Y., et al. (2016). Domain-adversarial training of neural networks. JMLR, 17(1), 2096\u20132030.","journal-title":"JMLR"},{"key":"1519_CR15","doi-asserted-by":"crossref","unstructured":"Girshick, S. (2015). Fast R-CNN . In: Proceedings of the IEEE ICCV.","DOI":"10.1109\/ICCV.2015.169"},{"key":"1519_CR16","unstructured":"Goodfellow, I. et al. (2014). Generative adversarial networks . In: arXiv preprint arXiv:1406.2661."},{"key":"1519_CR17","doi-asserted-by":"crossref","unstructured":"Guraya, F.F.E., et al. (2010). Predictive saliency maps for surveillance videos. In: DCABES. IEEE. pp. 508\u2013513.","DOI":"10.1109\/DCABES.2010.160"},{"key":"1519_CR18","unstructured":"Harel, J., Koch, C., Perona, P. (2007). Graph-based visual saliency . In: NIPS. pp. 545\u2013552."},{"key":"1519_CR19","doi-asserted-by":"crossref","unstructured":"He, K., et al. (2020). Mask R-CNN. In: IEEE TPAMI, 42(2), 386\u2013397.","DOI":"10.1109\/TPAMI.2018.2844175"},{"issue":"4","key":"1519_CR20","doi-asserted-by":"publisher","first-page":"815","DOI":"10.1109\/TPAMI.2018.2815688","volume":"41","author":"Q Hou","year":"2019","unstructured":"Hou, Q., et al. (2019). Deeply supervised salient object detection with short connections. IEEE TPAMI., 41(4), 815\u2013828.","journal-title":"IEEE TPAMI."},{"key":"1519_CR21","doi-asserted-by":"crossref","unstructured":"Huang, X., et al. (2015). Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks . In: ICCV. pp. 262\u2013270.","DOI":"10.1109\/ICCV.2015.38"},{"issue":"11","key":"1519_CR22","doi-asserted-by":"publisher","first-page":"1254","DOI":"10.1109\/34.730558","volume":"20","author":"L Itti","year":"1998","unstructured":"Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI, 20(11), 1254\u20131259.","journal-title":"IEEE TPAMI"},{"key":"1519_CR23","doi-asserted-by":"crossref","unstructured":"Jia, S., Bruce Neil, D.B. (2020). Eml-net: An expandable multi-layer network for saliency prediction . In: Image and vision computing. vol. 95, p. 103887.","DOI":"10.1016\/j.imavis.2020.103887"},{"key":"1519_CR24","unstructured":"Jiang, L., Xu, M., Wang, Z. (2017). Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM . In: arXiv preprint arXiv:1709.06316."},{"key":"1519_CR25","doi-asserted-by":"crossref","unstructured":"Jiang, L., et al. (2018). Deepvs: A deep learning based video saliency prediction approach . In: ECCV. pp. 602\u2013617.","DOI":"10.1007\/978-3-030-01264-9_37"},{"key":"1519_CR26","doi-asserted-by":"crossref","unstructured":"Jiang, M., et al. (2015). Salicon: Saliency in context . In: Proceedings of the IEEE CVPR. pp. 1072\u20131080.","DOI":"10.1109\/CVPR.2015.7298710"},{"key":"1519_CR27","unstructured":"Judd, T., Durand, F., Torralba, A. (2012). A benchmark of computational models of saliency to predict human fixations."},{"key":"1519_CR28","doi-asserted-by":"crossref","unstructured":"Kan, M., Shan, S., Chen, X. (2015). Bi- Shifting auto-encoder for unsupervised domain adaptation . In: ICCV.","DOI":"10.1109\/ICCV.2015.438"},{"key":"1519_CR29","unstructured":"Kay, W., et al. (2017). The kinetics human action video dataset . In: arXiv preprint arXiv:1705.06950."},{"key":"1519_CR30","unstructured":"Kingma Diederik, P., Ba, J. (2014). Adam: A method for stochastic optimization . In: arXiv preprint arXiv:1412.6980."},{"key":"1519_CR31","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1016\/j.neunet.2020.05.004","volume":"129","author":"A Kroner","year":"2020","unstructured":"Kroner, A., et al. (2020). Contextual encoder-decoder network for visual saliency prediction. Neural Networks, 129, 261\u2013270.","journal-title":"Neural Networks"},{"key":"1519_CR32","doi-asserted-by":"crossref","unstructured":"Kummerer, M., et al. (2017). Understanding lowand high-level contributions to fixation prediction . In: Proceedings of the IEEE ICCV.","DOI":"10.1109\/ICCV.2017.513"},{"key":"1519_CR33","first-page":"1113","volume":"29","author":"Q Lai","year":"2019","unstructured":"Lai, Q., et al. (2019). Video saliency prediction using spatiotemporal residual attentive networks. IEEE TIP, 29, 1113\u20131126.","journal-title":"IEEE TIP"},{"key":"1519_CR34","unstructured":"Li, J., et al. (2018). Unsupervised learning of viewinvariant action representations . In: NIPS. pp. 1254\u20131264."},{"key":"1519_CR35","doi-asserted-by":"crossref","unstructured":"Li, S., Lee M.C. (2007). Fast visual tracking using motion saliency in video . In: ICASSP. IEEE. Vol. 1, pp. I\u20131073.","DOI":"10.1109\/ICASSP.2007.366097"},{"key":"1519_CR36","unstructured":"Li, Y., et al. (2016). Revisiting batch normalization for practical domain adaptation . In: arXiv preprint arXiv:1603.04779."},{"key":"1519_CR37","doi-asserted-by":"crossref","unstructured":"Lim, M.K., et al. (2014). Crowd saliency detection via global similarity structure . In: ICPR. IEEE. pp. 3957\u20133962.","DOI":"10.1109\/ICPR.2014.678"},{"key":"1519_CR38","unstructured":"Linardos, P., et al. (2019). Simple vs complex temporal recurrences for video saliency prediction . In: arXiv preprint arXiv:1907.01869."},{"issue":"2","key":"1519_CR39","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1109\/TPAMI.2010.70","volume":"33","author":"T Liu","year":"2010","unstructured":"Liu, T., et al. (2010). Learning to detect a salient object. IEEE TPAMI, 33(2), 353\u2013367.","journal-title":"IEEE TPAMI"},{"key":"1519_CR40","unstructured":"Long, M., et al. (2015). Learning transferable features with deep adaptation networks . In: ICML. PMLR. pp. 97\u2013105."},{"key":"1519_CR41","doi-asserted-by":"crossref","unstructured":"Lu, L., et al. (2017). Crowd behavior understanding through SIOF feature analysis . In: ICAC. IEEE. pp. 1\u20136.","DOI":"10.23919\/IConAC.2017.8082086"},{"key":"1519_CR42","doi-asserted-by":"crossref","unstructured":"Marszalek, M., Laptev, I., Schmid C. (2009). Actions in context . In: CVPR. IEEE. pp. 2929\u20132936.","DOI":"10.1109\/CVPR.2009.5206557"},{"issue":"7","key":"1519_CR43","doi-asserted-by":"publisher","first-page":"1408","DOI":"10.1109\/TPAMI.2014.2366154","volume":"37","author":"S Mathe","year":"2014","unstructured":"Mathe, S., & Sminchisescu, C. (2014). Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. IEEE TPAMI, 37(7), 1408\u20131424.","journal-title":"IEEE TPAMI"},{"key":"1519_CR44","doi-asserted-by":"crossref","unstructured":"Min, K., Corso, J.J. (2019). TASED-Net: Temporally- aggregating spatial encoder-decoder network for video saliency detection . In: ICCV. pp. 2394\u20132403.","DOI":"10.1109\/ICCV.2019.00248"},{"key":"1519_CR45","doi-asserted-by":"crossref","unstructured":"Nguyen, T.V., et al. (2013). Static saliency versus dynamic saliency: A comparative study . In: ACM MM. pp. 987\u2013996.","DOI":"10.1145\/2502081.2502128"},{"key":"1519_CR46","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., Han, B. (2015). Learning deconvolution network for semantic segmentation . In: ICCV. pp. 1520\u20131528.","DOI":"10.1109\/ICCV.2015.178"},{"key":"1519_CR47","unstructured":"Pan, J., et al. (2017). Salgan: Visual saliency prediction with generative adversarial networks . In: arXiv preprint arXiv:1701.01081."},{"key":"1519_CR48","doi-asserted-by":"crossref","unstructured":"Pan, J., et al. (2016). Shallow and deep convolutional networks for saliency prediction . In: CVPR. pp. 598\u2013606.","DOI":"10.1109\/CVPR.2016.71"},{"key":"1519_CR49","doi-asserted-by":"crossref","unstructured":"Pan S.J., Yang Q. (2009). A survey on transfer learning. In: IEEE TKDE 22.10, pp. 1345\u20131359.","DOI":"10.1109\/TKDE.2009.191"},{"key":"1519_CR50","doi-asserted-by":"crossref","unstructured":"Redmon, J., et al. (2016). You only look once: Unified, real-time object detection . In: CVPR. pp. 779\u2013788.","DOI":"10.1109\/CVPR.2016.91"},{"key":"1519_CR51","first-page":"234","volume-title":"MICCAI","author":"O Ronneberger","year":"2015","unstructured":"Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. MICCAI (pp. 234\u2013241). Berlin: Springer."},{"key":"1519_CR52","doi-asserted-by":"crossref","unstructured":"Sandler, M., et al. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks . In: CVPR. pp. 4510-4520.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"1519_CR53","unstructured":"Mark, S., et al. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR. pp. 4510\u20134520."},{"key":"1519_CR54","unstructured":"Shao, J., Zhou, S.K., Chellappa, R. (2005). Tracking algorithm using background- foreground motion models and multiple cues [surveillance video applications]. In Proceedings (ICASSP\u201905) IEEE International conference on acoustics, speech, and signal processing, Vol. 2, pp. ii\u2013233."},{"key":"1519_CR55","doi-asserted-by":"crossref","unstructured":"Shokri, M., Harati, A., Taba, K. (2020). Salient object detection in video using deep nonlocal neural networks . In: JVCIR vol. 68, p. 102769.","DOI":"10.1016\/j.jvcir.2020.102769"},{"key":"1519_CR56","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1007\/978-3-319-09396-3_9","volume-title":"Computer vision in sports","author":"K Soomro","year":"2014","unstructured":"Soomro, K., & Zamir, A. R. (2014). Action recognition in realistic sports videos. Computer vision in sports (pp. 181\u2013208). Berlin: Springer."},{"key":"1519_CR57","first-page":"443","volume-title":"ECCV","author":"B Sun","year":"2016","unstructured":"Sun, B., & Saenko, K. (2016). Deep coral: Correlation alignment for deep domain adaptation. ECCV (pp. 443\u2013450). Berlin: Springer."},{"issue":"8","key":"1519_CR58","doi-asserted-by":"publisher","first-page":"2900","DOI":"10.1109\/TCYB.2018.2832053","volume":"49","author":"M Sun","year":"2018","unstructured":"Sun, M., et al. (2018). SG-FCN: A motion and memorybased deep learning model for video saliency detection. IEEE Transactions on Cybernetics, 49(8), 2900\u20132911.","journal-title":"IEEE Transactions on Cybernetics"},{"key":"1519_CR59","doi-asserted-by":"crossref","unstructured":"Tang, Y., et al. (2016). Large scale semi-supervised object detection using visual and semantic knowledge transfer. In: CVPR. pp. 2119\u20132128.","DOI":"10.1109\/CVPR.2016.233"},{"key":"1519_CR60","doi-asserted-by":"crossref","unstructured":"Tran, D., et al. (2015). Learning spatiotemporal features with 3d convolutional networks . In: ICCV. pp. 4489\u20134497.","DOI":"10.1109\/ICCV.2015.510"},{"key":"1519_CR61","doi-asserted-by":"crossref","unstructured":"Tzeng, E., et al. (2017). Adversarial discriminative domain adaptation . In: CVPR. pp. 7167\u20137176.","DOI":"10.1109\/CVPR.2017.316"},{"key":"1519_CR62","doi-asserted-by":"crossref","unstructured":"Wang, H., Xu, Y., Han, Y. (2018). Spotting and aggregating salient regions for video captioning. In: ACM MM. pp. 1519\u20131526.","DOI":"10.1145\/3240508.3240677"},{"key":"1519_CR63","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/j.neucom.2018.05.083","volume":"312","author":"M Wang","year":"2018","unstructured":"Wang, M., & Deng, W. (2018). Deep visual domain adaptation: A survey. Neurocomputing, 312, 135\u2013153.","journal-title":"Neurocomputing"},{"issue":"1","key":"1519_CR64","first-page":"38","volume":"27","author":"W Wang","year":"2017","unstructured":"Wang, W., Shen, J., & Shao, L. (2017). Video salient object detection via fully convolutional networks. IEEE TIP, 27(1), 38\u201349.","journal-title":"IEEE TIP"},{"issue":"1","key":"1519_CR65","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1109\/TPAMI.2019.2924417","volume":"43","author":"W Wang","year":"2019","unstructured":"Wang, W., et al. (2019). Revisiting video saliency prediction in the deep learning era. IEEE TPAMI, 43(1), 220\u2013237.","journal-title":"IEEE TPAMI"},{"key":"1519_CR66","doi-asserted-by":"crossref","unstructured":"Wang, W., et al. (2018). Revisiting video saliency: A large-scale benchmark and a new model . In: CVPR, pp. 4894\u20134903.","DOI":"10.1109\/CVPR.2018.00514"},{"key":"1519_CR67","doi-asserted-by":"crossref","unstructured":"Wang, X., et al. (2018). Non-local neural networks . In: CVPR, pp. 7794\u20137803.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"1519_CR68","doi-asserted-by":"crossref","unstructured":"Wang, J., Shen, W. (2018). Deep visual attention prediction. In: IEEE TIP.","DOI":"10.1109\/TIP.2017.2787612"},{"key":"1519_CR69","doi-asserted-by":"crossref","unstructured":"Wu, X., et al. (2020). SalSAC: A video saliency prediction model with shuffled attentions and correlationbased ConvLSTM . In: AAAI, pp. 12410\u201312417.","DOI":"10.1609\/aaai.v34i07.6927"},{"key":"1519_CR70","doi-asserted-by":"crossref","unstructured":"Xie, S., et al. (2018). Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: ECCV, pp. 305\u2013321.","DOI":"10.1007\/978-3-030-01267-0_19"},{"issue":"1","key":"1519_CR71","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/s12559-010-9094-8","volume":"3","author":"T Yubing","year":"2011","unstructured":"Yubing, T., et al. (2011). A spatiotemporal saliency model for video surveillance. Cognitive Computation, 3(1), 241\u2013263.","journal-title":"Cognitive Computation"},{"key":"1519_CR72","doi-asserted-by":"crossref","unstructured":"Zhang, J., et al. (2018). Deep unsupervised saliency detection: A multiple noisy labeling perspective . In: CVPR, pp. 9029\u20139038.","DOI":"10.1109\/CVPR.2018.00941"},{"key":"1519_CR73","doi-asserted-by":"crossref","unstructured":"Zhang, P., et al. (2017). Amulet: Aggregating multilevel convolutional features for salient object detection . In: IEEE ICCV.","DOI":"10.1109\/ICCV.2017.31"},{"key":"1519_CR74","doi-asserted-by":"crossref","unstructured":"Zhang, Y., David, P., Gong, B. (2017). Curriculum domain adaptation for semantic segmentation of urban scenes . In: ICCV, pp. 2020\u20132030.","DOI":"10.1109\/ICCV.2017.223"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-021-01519-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-021-01519-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-021-01519-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,29]],"date-time":"2021-10-29T07:19:50Z","timestamp":1635491990000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-021-01519-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,5]]},"references-count":74,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["1519"],"URL":"https:\/\/doi.org\/10.1007\/s11263-021-01519-y","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,5]]},"assertion":[{"value":"3 October 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 August 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}