{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T16:11:43Z","timestamp":1776442303973,"version":"3.51.2"},"reference-count":98,"publisher":"Springer Science and Business Media LLC","issue":"7-8","license":[{"start":{"date-parts":[[2020,9,13]],"date-time":"2020-09-13T00:00:00Z","timestamp":1599955200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,13]],"date-time":"2020-09-13T00:00:00Z","timestamp":1599955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005758","name":"Universit\u00e0 Politecnica delle Marche","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005758","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Machine Vision and Applications"],"published-print":{"date-parts":[[2020,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In retail environments, understanding how shoppers move about in a store\u2019s spaces and interact with products is very valuable. While the retail environment has several favourable characteristics that support computer vision, such as reasonable lighting, the large number and diversity of products sold, as well as the potential ambiguity of shoppers\u2019 movements, mean that accurately measuring shopper behaviour is still challenging. Over the past years, machine-learning and feature-based tools for people counting as well as interactions analytic and re-identification were developed with the aim of learning shopper skills based on occlusion-free RGB-D cameras in a top-view configuration. However, after moving into the era of multimedia big data, machine-learning approaches evolved into deep learning approaches, which are a more powerful and efficient way of dealing with the complexities of human behaviour. In this paper, a novel VRAI deep learning application that uses three convolutional neural networks to count the number of people passing or stopping in the camera area, perform top-view re-identification and measure shopper\u2013shelf interactions from a single RGB-D video flow with near real-time performances has been introduced. The framework is evaluated on the following three new datasets that are publicly available: TVHeads for people counting, HaDa for shopper\u2013shelf interactions and TVPR2 for people re-identification. The experimental results show that the proposed methods significantly outperform all competitive state-of-the-art methods (accuracy of 99.5% on people counting, 92.6% on interaction classification and 74.5% on re-id), bringing to different and significative insights for implicit and extensive shopper behaviour analysis for marketing applications.<\/jats:p>","DOI":"10.1007\/s00138-020-01118-w","type":"journal-article","created":{"date-parts":[[2020,9,13]],"date-time":"2020-09-13T05:02:31Z","timestamp":1599973351000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":33,"title":["Deep understanding of shopper behaviours and interactions using RGB-D vision"],"prefix":"10.1007","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5523-7174","authenticated-orcid":false,"given":"Marina","family":"Paolanti","sequence":"first","affiliation":[]},{"given":"Rocco","family":"Pietrini","sequence":"additional","affiliation":[]},{"given":"Adriano","family":"Mancini","sequence":"additional","affiliation":[]},{"given":"Emanuele","family":"Frontoni","sequence":"additional","affiliation":[]},{"given":"Primo","family":"Zingaretti","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,9,13]]},"reference":[{"issue":"2","key":"1118_CR1","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/s10846-017-0674-7","volume":"91","author":"M Paolanti","year":"2018","unstructured":"Paolanti, M., Liciotti, D., Pietrini, R., Mancini, A., Frontoni, E.: Modelling and forecasting customer navigation in intelligent retail environments. J. Intell. Robot. Syst. 91(2), 165\u2013180 (2018)","journal-title":"J. Intell. Robot. Syst."},{"key":"1118_CR2","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.patrec.2014.09.013","volume":"53","author":"J Liu","year":"2015","unstructured":"Liu, J., Liu, Y., Zhang, G., Zhu, P., Chen, Y.Q.: Detecting and tracking people in real time with rgb-d camera. Pattern Recogni. Lett. 53, 16\u201323 (2015)","journal-title":"Pattern Recogni. Lett."},{"key":"1118_CR3","doi-asserted-by":"crossref","unstructured":"Liciotti, D., Paolanti, M., Frontoni, E., Zingaretti, P.: People detection and tracking from an rgb-d camera in top-view configuration: review of challenges and applications. In: International Conference on Image Analysis and Processing, pp. 207\u2013218. Springer (2017)","DOI":"10.1007\/978-3-319-70742-6_20"},{"key":"1118_CR4","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1007\/978-3-319-12811-5_11","volume-title":"Video Analytics for Audience Measurement","author":"D Liciotti","year":"2014","unstructured":"Liciotti, D., Contigiani, M., Frontoni, E., Mancini, A., Zingaretti, P., Placidi, V.: Shopper analytics: a customer activity recognition system using a distributed rgb-d camera network. In: Distante, C., Battiato, S., Cavallaro, A. (eds.) Video Analytics for Audience Measurement, pp. 146\u2013157. Springer, Cham (2014)"},{"key":"1118_CR5","doi-asserted-by":"crossref","unstructured":"Liciotti, D., Paolanti, M., Pietrini, R., Frontoni, E., Zingaretti, P.: Convolutional networks for semantic heads segmentation using top-view depth data in crowded environment. In: 2018 24rd International Conference on Pattern Recognition (ICPR). IEEE (2018)","DOI":"10.1109\/ICPR.2018.8545397"},{"key":"1118_CR6","first-page":"1","volume-title":"Video Analytics. Face and Facial Expression Recognition and Audience Measurement","author":"D Liciotti","year":"2017","unstructured":"Liciotti, D., Paolanti, M., Frontoni, E., Mancini, A., Zingaretti, P.: Person re-identification dataset with rgb-d camera in a top-view configuration. In: Nasrollahi, K., Distante, C., Hua, G., Cavallaro, A., Moeslund, T.B., Battiato, S., Ji, Q. (eds.) Video Analytics. Face and Facial Expression Recognition and Audience Measurement, pp. 1\u201311. Springer, Cham (2017)"},{"issue":"2","key":"1118_CR7","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/S0022-4359(03)00007-1","volume":"79","author":"MJ Arnold","year":"2003","unstructured":"Arnold, M.J., Reynolds, K.E.: Hedonic shopping motivations. J. Retail. 79(2), 77\u201395 (2003)","journal-title":"J. Retail."},{"issue":"7553","key":"1118_CR8","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)","journal-title":"Nature"},{"issue":"7","key":"1118_CR9","doi-asserted-by":"publisher","first-page":"1198","DOI":"10.1109\/TPAMI.2007.70770","volume":"30","author":"T Zhao","year":"2008","unstructured":"Zhao, T., Nevatia, R., Wu, B.: Segmentation and tracking of multiple humans in crowded environments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1198\u20131211 (2008)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"10","key":"1118_CR10","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1002\/rob.20313","volume":"26","author":"R Bogdan Rusu","year":"2009","unstructured":"Bogdan Rusu, R., Sundaresan, A., Morisset, B., Hauser, K., Agrawal, M., Latombe, J.C., Beetz, M.: Leaving flatland: efficient real-time three-dimensional perception and motion planning. J. Field Robot. 26(10), 841\u2013862 (2009). https:\/\/doi.org\/10.1002\/rob.20313","journal-title":"J. Field Robot."},{"key":"1118_CR11","doi-asserted-by":"publisher","unstructured":"Felzenszwalb, P.F.: Learning models for object recognition. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol.\u00a01, pp. I\u2013I (2001). https:\/\/doi.org\/10.1109\/CVPR.2001.990647","DOI":"10.1109\/CVPR.2001.990647"},{"issue":"2","key":"1118_CR12","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1007\/s11263-006-0027-7","volume":"75","author":"B Wu","year":"2007","unstructured":"Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int. J. Comput. Vis. 75(2), 247\u2013266 (2007). https:\/\/doi.org\/10.1007\/s11263-006-0027-7","journal-title":"Int. J. Comput. Vis."},{"key":"1118_CR13","doi-asserted-by":"publisher","unstructured":"Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), vol.\u00a01, pp. 886\u2013893 vol. 1 (2005). https:\/\/doi.org\/10.1109\/CVPR.2005.177","DOI":"10.1109\/CVPR.2005.177"},{"issue":"10","key":"1118_CR14","doi-asserted-by":"publisher","first-page":"1831","DOI":"10.1109\/TPAMI.2009.109","volume":"31","author":"A Ess","year":"2009","unstructured":"Ess, A., Leibe, B., Schindler, K., van Gool, L.: Robust multiperson tracking from a mobile platform. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1831\u20131846 (2009). https:\/\/doi.org\/10.1109\/TPAMI.2009.109","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"7","key":"1118_CR15","doi-asserted-by":"publisher","first-page":"780","DOI":"10.1109\/34.598236","volume":"19","author":"CR Wren","year":"1997","unstructured":"Wren, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 780\u2013785 (1997)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"8","key":"1118_CR16","doi-asserted-by":"publisher","first-page":"809","DOI":"10.1109\/34.868683","volume":"22","author":"I Haritaoglu","year":"2000","unstructured":"Haritaoglu, I., Harwood, D., Davis, L.S.: W\/sup 4\/: real-time surveillance of people and their activities. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 809\u2013830 (2000). https:\/\/doi.org\/10.1109\/34.868683","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"9","key":"1118_CR17","doi-asserted-by":"publisher","first-page":"1208","DOI":"10.1109\/TPAMI.2004.73","volume":"26","author":"T Zhao","year":"2004","unstructured":"Zhao, T., Nevatia, R.: Tracking multiple humans in complex situations. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1208\u20131221 (2004). https:\/\/doi.org\/10.1109\/TPAMI.2004.73","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"5","key":"1118_CR18","doi-asserted-by":"publisher","first-page":"1318","DOI":"10.1109\/TCYB.2013.2265378","volume":"43","author":"J Han","year":"2013","unstructured":"Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318\u20131334 (2013). https:\/\/doi.org\/10.1109\/TCYB.2013.2265378","journal-title":"IEEE Trans. Cybern."},{"key":"1118_CR19","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1016\/j.patrec.2016.02.010","volume":"81","author":"M Sturari","year":"2016","unstructured":"Sturari, M., Liciotti, D., Pierdicca, R., Frontoni, E., Mancini, A., Contigiani, M., Zingaretti, P.: Robust and affordable retail customer profiling by vision and radio beacon sensor fusion. Pattern Recognit. Lett. 81, 30\u201340 (2016). https:\/\/doi.org\/10.1016\/j.patrec.2016.02.010","journal-title":"Pattern Recognit. Lett."},{"key":"1118_CR20","doi-asserted-by":"publisher","unstructured":"Dan, B., Kim, Y., Suryanto, Jung, J., Ko, S., : Robust people counting system based on sensor fusion. IEEE Trans. Consum. Electron. 58(3), 1013\u20131021 (2012). https:\/\/doi.org\/10.1109\/TCE.2012.6311350","DOI":"10.1109\/TCE.2012.6311350"},{"issue":"2","key":"1118_CR21","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1109\/TCE.2012.6227420","volume":"58","author":"J Han","year":"2012","unstructured":"Han, J., Pauwels, E.J., de Zeeuw, P.M., de With, P.H.N.: Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electron. 58(2), 255\u2013263 (2012). https:\/\/doi.org\/10.1109\/TCE.2012.6227420","journal-title":"IEEE Trans. Consum. Electron."},{"key":"1118_CR22","doi-asserted-by":"crossref","unstructured":"Hu, L., Hong, C., Zeng, Z., Wang, X.: Two-stream person re-identification with multi-task deep neural networks. Machine Vision and Applications pp. 1\u20138 (2018)","DOI":"10.1007\/s00138-018-0915-1"},{"key":"1118_CR23","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1007\/978-3-319-68560-1_36","volume-title":"Image Analysis and Processing\u2014ICIAP 2017","author":"M Paolanti","year":"2017","unstructured":"Paolanti, M., Kaiser, C., Schallner, R., Frontoni, E., Zingaretti, P.: Visual and textual sentiment analysis of brand-related social media pictures using deep convolutional neural networks. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) Image Analysis and Processing\u2014ICIAP 2017, pp. 402\u2013413. Springer, Cham (2017)"},{"key":"1118_CR24","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1007\/978-3-319-10584-0_20","volume-title":"Computer Vision\u2014ECCV 2014","author":"B Hariharan","year":"2014","unstructured":"Hariharan, B., Arbel\u00e1ez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision\u2014ECCV 2014, pp. 297\u2013312. Springer, Cham (2014)"},{"key":"1118_CR25","doi-asserted-by":"crossref","unstructured":"Paolanti, M., Sturari, M., Mancini, A., Zingaretti, P., Frontoni, E.: Mobile robot for retail surveying and inventory using visual and textual analysis of monocular pictures based on deep learning. In: 2017 European Conference on Mobile Robots (ECMR), pp. 1\u20136. IEEE (2017)","DOI":"10.1109\/ECMR.2017.8098666"},{"key":"1118_CR26","unstructured":"Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS\u201912, pp. 1097\u20131105. Curran Associates Inc., USA (2012). http:\/\/dl.acm.org\/citation.cfm?id=2999134.2999257"},{"key":"1118_CR27","unstructured":"Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556 (2014). arXiv:1409.1556"},{"key":"1118_CR28","doi-asserted-by":"publisher","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1\u20139 (2015). https:\/\/doi.org\/10.1109\/CVPR.2015.7298594","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"1118_CR29","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"issue":"4","key":"1118_CR30","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","volume":"39","author":"E Shelhamer","year":"2017","unstructured":"Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640\u2013651 (2017). https:\/\/doi.org\/10.1109\/TPAMI.2016.2572683","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"2","key":"1118_CR31","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/s13735-017-0141-z","volume":"7","author":"Y Guo","year":"2018","unstructured":"Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimed. Inf. Retr. 7(2), 87\u201393 (2018). https:\/\/doi.org\/10.1007\/s13735-017-0141-z","journal-title":"Int. J. Multimed. Inf. Retr."},{"issue":"2","key":"1118_CR32","doi-asserted-by":"publisher","first-page":"512","DOI":"10.1109\/JIOT.2017.2714181","volume":"5","author":"JW Choi","year":"2017","unstructured":"Choi, J.W., Quan, X., Cho, S.H.: Bi-directional passing people counting system based on ir-uwb radar sensors. IEEE Internet Things J. 5(2), 512\u2013522 (2017)","journal-title":"IEEE Internet Things J."},{"issue":"3","key":"1118_CR33","doi-asserted-by":"publisher","first-page":"819","DOI":"10.1109\/TCE.2012.6311323","volume":"58","author":"B Mrazovac","year":"2012","unstructured":"Mrazovac, B., Bjelica, M.Z., Kukolj, D., Todorovic, B.M., Samardzija, D.: A human detection method for residential smart energy systems based on zigbee rssi changes. IEEE Trans. Consum. Electron. 58(3), 819\u2013824 (2012)","journal-title":"IEEE Trans. Consum. Electron."},{"issue":"9","key":"1118_CR34","doi-asserted-by":"publisher","first-page":"3991","DOI":"10.1109\/TIE.2012.2206330","volume":"60","author":"J Garc\u00eda","year":"2012","unstructured":"Garc\u00eda, J., Gardel, A., Bravo, I., L\u00e1zaro, J.L., Mart\u00ednez, M., Rodr\u00edguez, D.: Directional people counter based on head tracking. IEEE Trans. Ind. Electron. 60(9), 3991\u20134000 (2012)","journal-title":"IEEE Trans. Ind. Electron."},{"key":"1118_CR35","doi-asserted-by":"crossref","unstructured":"Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 1299\u20131302 (2015)","DOI":"10.1145\/2733373.2806337"},{"key":"1118_CR36","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.engappai.2015.04.006","volume":"43","author":"M Fu","year":"2015","unstructured":"Fu, M., Xu, P., Li, X., Liu, Q., Ye, M., Zhu, C.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81\u201388 (2015)","journal-title":"Eng. Appl. Artif. Intell."},{"key":"1118_CR37","unstructured":"Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833\u2013841 (2015)"},{"key":"1118_CR38","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1016\/j.patrec.2016.05.033","volume":"81","author":"L Del Pizzo","year":"2016","unstructured":"Del Pizzo, L., Foggia, P., Greco, A., Percannella, G., Vento, M.: Counting people by RGB or depth overhead cameras. Pattern Recognit. Lett. 81, 41\u201350 (2016)","journal-title":"Pattern Recognit. Lett."},{"issue":"8","key":"1118_CR39","doi-asserted-by":"publisher","first-page":"1788","DOI":"10.1109\/TCSVT.2016.2637379","volume":"28","author":"B Sheng","year":"2016","unstructured":"Sheng, B., Shen, C., Lin, G., Li, J., Yang, W., Sun, C.: Crowd counting via weighted vlad on a dense attribute feature map. IEEE Trans. Circuits Syst. Video Technol. 28(8), 1788\u20131797 (2016)","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"1118_CR40","doi-asserted-by":"crossref","unstructured":"Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1215\u20131219. IEEE (2016)","DOI":"10.1109\/ICIP.2016.7532551"},{"key":"1118_CR41","unstructured":"Yao, H., Han, K., Wan, W., Hou, L.: Deep spatial regression model for image crowd counting. arXiv preprint arXiv:1710.09757 (2017)"},{"key":"1118_CR42","doi-asserted-by":"crossref","unstructured":"Fang, Y., Gao, S., Li, J., Luo, W., He, L., Hu, B.: Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting. Neurocomputing (2020)","DOI":"10.1016\/j.neucom.2020.01.087"},{"key":"1118_CR43","doi-asserted-by":"crossref","unstructured":"Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198\u20138207 (2019)","DOI":"10.1109\/CVPR.2019.00839"},{"key":"1118_CR44","doi-asserted-by":"crossref","unstructured":"Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1\u20138. IEEE (2008)","DOI":"10.1109\/CVPR.2008.4587756"},{"key":"1118_CR45","doi-asserted-by":"crossref","unstructured":"Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 17\u201324. IEEE (2010)","DOI":"10.1109\/CVPR.2010.5540235"},{"issue":"3","key":"1118_CR46","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1007\/s11263-015-0851-8","volume":"119","author":"M Rohrbach","year":"2016","unstructured":"Rohrbach, M., Rohrbach, A., Regneri, M., Amin, S., Andriluka, M., Pinkal, M., Schiele, B.: Recognizing fine-grained and composite activities using hand-centric features and script data. Int. J. Comput. Vis. 119(3), 346\u2013373 (2016)","journal-title":"Int. J. Comput. Vis."},{"key":"1118_CR47","unstructured":"Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)"},{"key":"1118_CR48","doi-asserted-by":"crossref","unstructured":"Kim, S., Yun, K., Park, J., Choi, J.Y.: Skeleton-based action recognition of people handling objects. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 61\u201370. IEEE (2019)","DOI":"10.1109\/WACV.2019.00014"},{"key":"1118_CR49","unstructured":"Moghaddam, M.M.K., Abbasnejad, E., Shi, J.: Follow the attention: Combining partial pose and object motion for fine-grained action detection. arXiv preprint arXiv:1905.04430 (2019)"},{"issue":"8","key":"1118_CR50","doi-asserted-by":"publisher","first-page":"1629","DOI":"10.1109\/TPAMI.2014.2369055","volume":"37","author":"G Lisanti","year":"2015","unstructured":"Lisanti, G., Masi, I., Bagdanov, A.D., Bimbo, A.D.: Person re-identification by iterative re-weighted sparse ranking. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1629\u20131642 (2015). https:\/\/doi.org\/10.1109\/TPAMI.2014.2369055","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"1118_CR51","unstructured":"Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: In IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro (2007)"},{"key":"1118_CR52","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1007\/978-3-540-88682-2_21","volume-title":"Computer Vision\u2014ECCV 2008","author":"D Gray","year":"2008","unstructured":"Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision\u2014ECCV 2008, pp. 262\u2013275. Springer, Berlin (2008)"},{"key":"1118_CR53","unstructured":"Madden, C., Piccardi, M.: Height measurement as a session-based biometric for people matching across disjoint camera views. In: In Image and Vision Computing New Zealand, p.\u00a029 (2005)"},{"issue":"4","key":"1118_CR54","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1109\/TCSVT.2015.2424056","volume":"26","author":"F Pala","year":"2016","unstructured":"Pala, F., Satta, R., Fumera, G., Roli, F.: Multimodal person reidentification using rgb-d cameras. IEEE Trans. Circuits Syst. Video Technol. 26(4), 788\u2013799 (2016). https:\/\/doi.org\/10.1109\/TCSVT.2015.2424056","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"1118_CR55","doi-asserted-by":"publisher","unstructured":"Dong Seon\u00a0Cheng Marco\u00a0Cristani, M.S.L.B., Murino, V.: Custom pictorial structures for re-identification. In: Proceedings of the British Machine Vision Conference, pp. 68.1\u201368.11. BMVA Press (2011). https:\/\/doi.org\/10.5244\/C.25.68","DOI":"10.5244\/C.25.68"},{"key":"1118_CR56","doi-asserted-by":"publisher","unstructured":"B\u0105k, S., Corvee, E., Br\u00e9mond, F., Thonnat, M.: Multiple-shot human re-identification by mean riemannian covariance grid. In: 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 179\u2013184 (2011). https:\/\/doi.org\/10.1109\/AVSS.2011.6027316","DOI":"10.1109\/AVSS.2011.6027316"},{"issue":"10","key":"1118_CR57","doi-asserted-by":"publisher","first-page":"3471","DOI":"10.3390\/s18103471","volume":"18","author":"M Paolanti","year":"2018","unstructured":"Paolanti, M., Romeo, L., Liciotti, D., Pietrini, R., Cenci, A., Frontoni, E., Zingaretti, P.: Person re-identification with RGB-D camera in top-view configuration through multiple nearest neighbor classifiers and neighborhood component features selection. Sensors 18(10), 3471 (2018)","journal-title":"Sensors"},{"key":"1118_CR58","doi-asserted-by":"crossref","unstructured":"Haque, A., Alahi, A., Fei-Fei, L.: Recurrent attention models for depth-based person identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1229\u20131238 (2016)","DOI":"10.1109\/CVPR.2016.138"},{"key":"1118_CR59","doi-asserted-by":"crossref","unstructured":"Lejbolle, A.R., Nasrollahi, K., Krogh, B., Moeslund, T.B.: Multimodal neural network for overhead person re-identification. In: 2017 International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1\u20135. IEEE (2017)","DOI":"10.23919\/BIOSIG.2017.8053514"},{"key":"1118_CR60","doi-asserted-by":"crossref","unstructured":"Lejbolle, A.R., Krogh, B., Nasrollahi, K., Moeslund, T.B.: Attention in multimodal neural networks for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 179\u2013187 (2018)","DOI":"10.1109\/CVPRW.2018.00055"},{"key":"1118_CR61","doi-asserted-by":"crossref","unstructured":"Lejb\u00f8lle, A.R., Nasrollahi, K., Krogh, B., Moeslund, T.B.: Person re-identification using spatial and layer-wise attention. IEEE Transactions on Information Forensics and Security (2019)","DOI":"10.1109\/TIFS.2019.2938870"},{"key":"1118_CR62","doi-asserted-by":"crossref","unstructured":"Liciotti, D., Frontoni, E., Mancini, A., Zingaretti, P.: Pervasive system for consumer behaviour analysis in retail environments. In: Video Analytics. Face and Facial Expression Recognition and Audience Measurement, pp. 12\u201323. Springer (2016)","DOI":"10.1007\/978-3-319-56687-0_2"},{"key":"1118_CR63","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: arXiv preprint arXiv:1505.04597 (2015)","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"1118_CR64","doi-asserted-by":"crossref","unstructured":"Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 648\u2013656 (2015)","DOI":"10.1109\/CVPR.2015.7298664"},{"key":"1118_CR65","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818\u20132826 (2016)","DOI":"10.1109\/CVPR.2016.308"},{"key":"1118_CR66","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol.\u00a04, p.\u00a012 (2017)","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"1118_CR67","unstructured":"Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)"},{"key":"1118_CR68","unstructured":"Carneiro, Z.L.G.: On the importance of normalisation layers in deep learning with piecewise linear activation units. Methods for Understanding and Improving Deep Learning Classification Models p.\u00a058 (2017)"},{"key":"1118_CR69","doi-asserted-by":"crossref","unstructured":"Liciotti, D., Paolanti, M., Frontoni, E., Mancini, A., Zingaretti, P.: Person re-identification dataset with rgb-d camera in a top-view configuration. In: Video Analytics. Face and Facial Expression Recognition and Audience Measurement, pp. 1\u201311. Springer (2016)","DOI":"10.1007\/978-3-319-56687-0_1"},{"key":"1118_CR70","unstructured":"Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. In: CoRR (2015)"},{"key":"1118_CR71","unstructured":"Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: arXiv preprint arXiv:1605.07648 (2016)"},{"key":"1118_CR72","doi-asserted-by":"crossref","unstructured":"Ravishankar, H., Venkataramani, R., Thiruvenkadam, S., Sudhakar, P., Vaidya, V.: Learning and incorporating shape models for semantic segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 203\u2013211 (2017)","DOI":"10.1007\/978-3-319-66182-7_24"},{"key":"1118_CR73","first-page":"547","volume":"37","author":"P Jaccard","year":"1901","unstructured":"Jaccard, P.: \u00c9tude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547\u2013579 (1901)","journal-title":"Bull. Soc. Vaudoise Sci. Nat."},{"issue":"3","key":"1118_CR74","doi-asserted-by":"publisher","first-page":"297","DOI":"10.2307\/1932409","volume":"26","author":"LR Dice","year":"1945","unstructured":"Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297\u2013302 (1945)","journal-title":"Ecology"},{"key":"1118_CR75","doi-asserted-by":"crossref","unstructured":"Frontoni, E., Paolanti, M., Pietrini, R.: People counting in crowded environment and re-identification. In: RGB-D Image Analysis and Processing, pp. 397\u2013425. Springer (2019)","DOI":"10.1007\/978-3-030-28603-3_18"},{"key":"1118_CR76","doi-asserted-by":"publisher","unstructured":"Zhang, X., Yan, J., Feng, S., Lei, Z., Yi, D., Li, S.Z.: Water filling: Unsupervised people counting via vertical Kinect sensor. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 215\u2013220 (2012). https:\/\/doi.org\/10.1109\/AVSS.2012.82","DOI":"10.1109\/AVSS.2012.82"},{"key":"1118_CR77","unstructured":"LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396\u2013404 (1990)"},{"key":"1118_CR78","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675\u2013678. ACM (2014)","DOI":"10.1145\/2647868.2654889"},{"key":"1118_CR79","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.070122(6) (2017)","DOI":"10.1109\/CVPR.2018.00907"},{"key":"1118_CR80","doi-asserted-by":"crossref","unstructured":"Chollet, F.: Xception: Deep learning with depthwise separable convolutions. arXiv preprint pp. 1610\u201302357 (2017)","DOI":"10.1109\/CVPR.2017.195"},{"issue":"3","key":"1118_CR81","doi-asserted-by":"publisher","first-page":"e34","DOI":"10.1093\/nar\/gnh026","volume":"32","author":"TH B\u00f8","year":"2004","unstructured":"B\u00f8, T.H., Dysvik, B., Jonassen, I.: Lsimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32(3), e34\u2013e34 (2004)","journal-title":"Nucleic Acids Res."},{"issue":"3","key":"1118_CR82","first-page":"273","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273\u2013297 (1995)","journal-title":"Mach. Learn."},{"issue":"1","key":"1118_CR83","first-page":"81","volume":"1","author":"JR Quinlan","year":"1986","unstructured":"Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81\u2013106 (1986)","journal-title":"Mach. Learn."},{"issue":"1","key":"1118_CR84","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L.: Random forests. Mach. Learn. 45(1), 5\u201332 (2001)","journal-title":"Mach. Learn."},{"key":"1118_CR85","unstructured":"Rish, I.: An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol.\u00a03, pp. 41\u201346. IBM New York (2001)"},{"key":"1118_CR86","doi-asserted-by":"crossref","unstructured":"Yuan, Y., Chen, W., Yang, Y., Wang, Z.: In defense of the triplet loss again: Learning robust person re-identification with fast approximated triplet loss and label distillation. arXiv preprint arXiv:1912.07863 (2019)","DOI":"10.1109\/CVPRW50498.2020.00185"},{"key":"1118_CR87","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Jin, L., Xie, Z.: High performance offline handwritten chinese character recognition using googlenet and directional feature maps. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 846\u2013850. IEEE (2015)","DOI":"10.1109\/ICDAR.2015.7333881"},{"key":"1118_CR88","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248\u2013255. IEEE (2009)","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"1118_CR89","unstructured":"Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)"},{"key":"1118_CR90","doi-asserted-by":"crossref","unstructured":"Hamdoun, O., Moutarde, F., Stanciulescu, B., Steux, B.: Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences. In: Second ACM\/IEEE International Conference on Distributed Smart Cameras, 2008. ICDSC 2008. IEEE, pp. 1\u20136 (2008)","DOI":"10.1109\/ICDSC.2008.4635689"},{"key":"1118_CR91","doi-asserted-by":"crossref","unstructured":"Li Y.and\u00a0Wu, Z., Radke, R.: Multi-shot re-identification with random-projection-based random forests. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 373\u2013380 (2015)","DOI":"10.1109\/WACV.2015.56"},{"key":"1118_CR92","doi-asserted-by":"crossref","unstructured":"Bay, S.D.: Nearest neighbor classification from multiple feature subsets. In: Intelligent Data Analysis, pp. 191\u2013209 (1999)","DOI":"10.1016\/S1088-467X(99)00018-9"},{"key":"1118_CR93","doi-asserted-by":"crossref","unstructured":"Prosser, B., Zheng, W., Gong, S., Xiang, T., Mary, Q.: Person re-identification by support vector ranking. In: BMVC, vol.\u00a02, p.\u00a06 (2010)","DOI":"10.5244\/C.24.21"},{"key":"1118_CR94","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1016\/j.jretconser.2017.02.003","volume":"37","author":"H Sorensen","year":"2017","unstructured":"Sorensen, H., Bogomolova, S., Anderson, K., Trinh, G., Sharp, A., Kennedy, R., Page, B., Wright, M.: Fundamental patterns of in-store shopper behavior. J. Retail. Consum. Serv. 37, 182\u2013194 (2017). https:\/\/doi.org\/10.1016\/j.jretconser.2017.02.003","journal-title":"J. Retail. Consum. Serv."},{"issue":"4\/5","key":"1118_CR95","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1108\/eb028133","volume":"14","author":"H Phillips","year":"1991","unstructured":"Phillips, H., Bradshaw, R.: Camera tracking: a new tool for market research and retail management. Manag. Res. News 14(4\/5), 20\u201322 (1991). https:\/\/doi.org\/10.1108\/eb028133","journal-title":"Manag. Res. News"},{"key":"1118_CR96","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1016\/j.apgeog.2016.11.005","volume":"78","author":"D Oosterlinck","year":"2017","unstructured":"Oosterlinck, D., Benoit, D.F., Baecke, P., de Weghe, N.V.: Bluetooth tracking of humans in an indoor environment: an application to shopping mall visits. Appl. Geogr. 78, 55\u201365 (2017). https:\/\/doi.org\/10.1016\/j.apgeog.2016.11.005","journal-title":"Appl. Geogr."},{"key":"1118_CR97","doi-asserted-by":"publisher","unstructured":"Roedel, E.: Fisher, r. a.: Statistical methods for research workers, 14. aufl., oliver & boyd, edinburgh, london 1970. xiii, 362 s., 12 abb., 74 tab., 40 s. Biometrische Zeitschrift 13(6), 429\u2013430 (1970). https:\/\/doi.org\/10.1002\/bimj.19710130623. https:\/\/onlinelibrary.wiley.com\/doi\/abs\/10.1002\/bimj.19710130623","DOI":"10.1002\/bimj.19710130623"},{"issue":"1","key":"1118_CR98","doi-asserted-by":"publisher","first-page":"101","DOI":"10.2307\/3001666","volume":"10","author":"WG Cochran","year":"1954","unstructured":"Cochran, W.G.: The combination of estimates from different experiments. Biometrics 10(1), 101\u2013129 (1954)","journal-title":"Biometrics"}],"container-title":["Machine Vision and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00138-020-01118-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00138-020-01118-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00138-020-01118-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,7]],"date-time":"2023-10-07T07:45:08Z","timestamp":1696664708000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00138-020-01118-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,13]]},"references-count":98,"journal-issue":{"issue":"7-8","published-print":{"date-parts":[[2020,11]]}},"alternative-id":["1118"],"URL":"https:\/\/doi.org\/10.1007\/s00138-020-01118-w","relation":{},"ISSN":["0932-8092","1432-1769"],"issn-type":[{"value":"0932-8092","type":"print"},{"value":"1432-1769","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,13]]},"assertion":[{"value":"14 April 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 August 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 August 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 September 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"66"}}