{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T15:17:09Z","timestamp":1773415029953,"version":"3.50.1"},"reference-count":98,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T00:00:00Z","timestamp":1607558400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Understanding the behaviors and intentions of humans is still one of the main challenges for vehicle autonomy. More specifically, inferring the intentions and actions of vulnerable actors, namely pedestrians, in complex situations such as urban traffic scenes remains a difficult task and a blocking point towards more automated vehicles. Answering the question \u201cIs the pedestrian going to cross?\u201d is a good starting point in order to advance in the quest to the fifth level of autonomous driving. In this paper, we address the problem of real-time discrete intention prediction of pedestrians in urban traffic environments by linking the dynamics of a pedestrian\u2019s skeleton to an intention. Hence, we propose SPI-Net (Skeleton-based Pedestrian Intention network): a representation-focused multi-branch network combining features from 2D pedestrian body poses for the prediction of pedestrians\u2019 discrete intentions. Experimental results show that SPI-Net achieved 94.4% accuracy in pedestrian crossing prediction on the JAAD data set while being efficient for real-time scenarios since SPI-Net can reach around one inference every 0.25 ms on one GPU (i.e., RTX 2080ti), or every 0.67 ms on one CPU (i.e., Intel Core i7 8700K).<\/jats:p>","DOI":"10.3390\/a13120331","type":"journal-article","created":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T08:59:34Z","timestamp":1607590774000},"page":"331","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":33,"title":["Predicting Intentions of Pedestrians from 2D Skeletal Pose Sequences with a Representation-Focused Multi-Branch Deep Learning Network"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9519-3591","authenticated-orcid":false,"given":"Joseph","family":"Gesnouin","sequence":"first","affiliation":[{"name":"Institut VEDECOM\u2014Versailles, 78000 Versailles, France"},{"name":"Centre de Robotique, MINES ParisTech, Universit\u00e9 PSL, 75006 Paris, France"}]},{"given":"Steve","family":"Pechberti","sequence":"additional","affiliation":[{"name":"Institut VEDECOM\u2014Versailles, 78000 Versailles, France"}]},{"given":"Guillaume","family":"Bresson","sequence":"additional","affiliation":[{"name":"Institut VEDECOM\u2014Versailles, 78000 Versailles, France"}]},{"given":"Bogdan","family":"Stanciulescu","sequence":"additional","affiliation":[{"name":"Centre de Robotique, MINES ParisTech, Universit\u00e9 PSL, 75006 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4799-7285","authenticated-orcid":false,"given":"Fabien","family":"Moutarde","sequence":"additional","affiliation":[{"name":"Centre de Robotique, MINES ParisTech, Universit\u00e9 PSL, 75006 Paris, France"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., and Darrell, T. (2015, January 7\u201312). Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2014). Learning Spatiotemporal Features with 3D Convolutional Networks. arXiv.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1109\/TPAMI.2017.2712608","article-title":"Long-term temporal convolutions for action recognition","volume":"40","author":"Varol","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wu, C.Y., Zaheer, M., Hu, H., Manmatha, R., Smola, A.J., and Kr\u00e4henb\u00fchl, P. (2018, January 18\u201323). Compressed Video Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00631"},{"key":"ref_5","unstructured":"Simonyan, K., and Zisserman, A. (2020, December 09). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 2014; pp. 568\u2013576. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.749.5720&rep=rep1&type=pdf."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhang, B., Wang, L., Wang, Z., Qiao, Y., and Wang, H. (2016, January 27\u201330). Real-time action recognition with enhanced motion vector CNNs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.297"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sevilla-Lara, L., Liao, Y., G\u00fcney, F., Jampani, V., Geiger, A., and Black, M.J. On the integration of optical flow and action recognition. Proceedings of the German Conference on Pattern Recognition, Stuttgart, Germany, 9\u201312 October 2018.","DOI":"10.1007\/978-3-030-12939-2_20"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pop, D., Rogozan, A., Chatelain, C., Nashashibi, F., and Bensrhair, A. (2019). Multi-Task Deep Learning for Pedestrian Detection, Action Recognition and Time to Cross Prediction. IEEE Access.","DOI":"10.1109\/ACCESS.2019.2944792"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Vemulapalli, R., Arrate, F., and Chellappa, R. (2014, January 23\u201328). Human action recognition by representing 3d skeletons as points in a lie group. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.82"},{"key":"ref_10","unstructured":"Du, Y., Wang, W., and Wang, L. (2015, January 7\u201312). Hierarchical recurrent neural network for skeleton based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016). Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. arXiv.","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018). Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, Y., Tian, Y., and He, M. (2020). Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods. arXiv.","DOI":"10.1016\/j.cviu.2019.102897"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., and Sun, J. (2017). Cascaded Pyramid Network for Multi-Person Pose Estimation. arXiv.","DOI":"10.1109\/CVPR.2018.00742"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Rasouli, A., Kotseruba, I., and Tsotsos, J.K. (2017, January 22\u201329). Are they going to cross?. A benchmark dataset and baseline for pedestrian crosswalk behavior. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.33"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Rasouli, A., Kotseruba, I., and Tsotsos, J.K. (2017, January 11\u201314). Agreeing to cross: How drivers and pedestrians communicate. Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Redondo Beach, CA, USA.","DOI":"10.1109\/IVS.2017.7995730"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., and Baskurt, A. (2011). Sequential deep learning for human action recognition. International Workshop on Human Behavior Understanding, Springer.","DOI":"10.1007\/978-3-642-25446-8_4"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1109\/TMM.2018.2856094","article-title":"Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures","volume":"21","author":"Avola","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, S., Liu, X., and Xiao, J. On geometric features for skeleton-based action recognition using multilayer lstm networks. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24\u201331 March 2017.","DOI":"10.1109\/WACV.2017.24"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Shukla, P., Biswas, K.K., and Kalra, P.K. Recurrent neural network based action recognition from 3D skeleton data. Proceedings of the 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Jaipur, India, 4\u20137 December 2017.","DOI":"10.1109\/SITIS.2017.63"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., and Zheng, N. (2017, January 22\u201329). View adaptive recurrent neural networks for high performance human action recognition from skeleton data. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.233"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Amato, G., Connor, R., Falchi, F., and Gennaro, C. (2015). Motion Images: An Effective Representation of Motion Capture Data for Similarity Search. Similarity Search and Applications, Springer International Publishing.","DOI":"10.1007\/978-3-319-25087-8"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"12073","DOI":"10.1007\/s11042-017-4859-7","article-title":"Effective and Efficient Similarity Searching in Motion Capture Data","volume":"77","author":"Sedmidubsky","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ke, Q., Bennamoun, M., An, S., Sohel, F., and Boussaid, F. (2017, January 21\u201326). A new representation of skeleton sequences for 3d action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.486"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1049\/iet-cvi.2018.5014","article-title":"Learning to recognise 3D human action from a new skeleton-based representation using deep convolutional neural networks","volume":"13","author":"Pham","year":"2018","journal-title":"IET Comput. Vis."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3247","DOI":"10.1109\/TCSVT.2018.2879913","article-title":"Skeleton-Based Action Recognition with Gated Convolutional Neural Networks","volume":"29","author":"Cao","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ludl, D., Gulde, T., and Curio, C. Simple yet efficient real-time pose-based action recognition. Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27\u201330 October 2019.","DOI":"10.1109\/ITSC.2019.8917128"},{"key":"ref_32","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Devineau, G., Moutarde, F., Xi, W., and Yang, J. Deep learning for hand gesture recognition on skeletal data. Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi\u2019an, China, 15\u201319 May 2018.","DOI":"10.1109\/FG.2018.00025"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Weng, J., Liu, M., Jiang, X., and Yuan, J. (2018, January 8\u201314). Deformable pose traversal convolution for 3d action and gesture recognition. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_9"},{"key":"ref_35","unstructured":"Li, C., Wang, P., Wang, S., Hou, Y., and Li, W. Skeleton-based action recognition using LSTM and CNN. Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 10\u201314 July 2017."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/ACCESS.2017.2778011","article-title":"Action recognition in video sequences using deep bi-directional LSTM with CNN features","volume":"6","author":"Ullah","year":"2017","journal-title":"IEEE Access"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Maghoumi, M., and LaViola Jr, J.J. DeepGRU: Deep gesture recognition utility. Proceedings of the International Symposium on Visual Computing, Lake Tahoe, NV, USA, 7\u20139 October 2019.","DOI":"10.1007\/978-3-030-33720-9_2"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Song, S., Lan, C., Xing, J., Zeng, W., and Liu, J. (2017, January 4\u20139). An end-to-end spatio-temporal attention model for human action recognition from skeleton data. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11212"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1109\/TMM.2018.2859620","article-title":"Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition","volume":"21","author":"Fan","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Hou, J., Wang, G., Chen, X., Xue, J.H., Zhu, R., and Yang, H. (2018, January 8\u201314). Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-11024-6_18"},{"key":"ref_42","first-page":"729","article-title":"A new model for learning in graph domains","volume":"Volume 2","author":"Gori","year":"2005","journal-title":"Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, 2005, Montreal, QC, Canada, 31 July\u20134 August 2005"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The graph neural network model","volume":"20","author":"Scarselli","year":"2008","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MSP.2017.2693418","article-title":"Geometric deep learning: Going beyond euclidean data","volume":"34","author":"Bronstein","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_45","unstructured":"Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P.S. (2019). A comprehensive survey on graph neural networks. arXiv."},{"key":"ref_46","unstructured":"Zhang, X., Xu, C., Tian, X., and Tao, D. (2018). Graph Edge Convolutional Neural Networks for Skeleton Based Action Recognition. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 2\u20137). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LO, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Li, C., Cui, Z., Zheng, W., Xu, C., and Yang, J. (2018). Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition. arXiv.","DOI":"10.1609\/aaai.v32i1.11776"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Si, C., Chen, W., Wang, W., Wang, L., and Tan, T. (2019, January 15\u201320). An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00132"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Varytimidis, D., Alonso-Fernandez, F., Duran, B., and Englund, C. (2018). Action and intention recognition of pedestrians in urban traffic. arXiv.","DOI":"10.1109\/SITIS.2018.00109"},{"key":"ref_51","unstructured":"Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, Curran Associates, Inc."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Saleh, K., Hossny, M., and Nahavandi, S. (2019). Real-time Intent Prediction of Pedestrians for Autonomous Ground Vehicles via Spatio-Temporal DenseNet. arXiv.","DOI":"10.1109\/ICRA.2019.8793991"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Gujjar, P., and Vaughan, R. (2019, January 20\u201324). Classifying Pedestrian Actions In Advance Using Predicted Video Of Urban Driving Scenes. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794278"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Chaabane, M., Trabelsi, A., Blanchard, N., and Beveridge, R. (2019). Looking Ahead: Anticipating Pedestrians Crossing with Future Frames Prediction. arXiv.","DOI":"10.1109\/WACV45572.2020.9093426"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Fang, Z., and L\u00f3pez, A.M. (2018). Is the Pedestrian going to Cross? Answering by 2D Pose Estimation. arXiv.","DOI":"10.1109\/IVS.2018.8500413"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Marginean, A., Brehar, R., and Negru, M. (2019, January 18\u201320). Understanding pedestrian behaviour with pose estimation and recurrent networks. Proceedings of the 2019 6th International Symposium on Electrical and Electronics Engineering (ISEEE), Galati, Romania.","DOI":"10.1109\/ISEEE48094.2019.9136126"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Ghori, O., Mackowiak, R., Bautista, M., Beuter, N., Drumond, L., Diego, F., and Ommer, B. (2018, January 26\u201330). Learning to Forecast Pedestrian Intention from Pose Dynamics. Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China.","DOI":"10.1109\/IVS.2018.8500657"},{"key":"ref_58","unstructured":"Gantier, R., Yang, M., Qian, Y., and Wang, C. (2019, January 27\u201330). Pedestrian Graph: Pedestrian Crossing Prediction Based on 2D Pose Estimation and Graph Convolutional Networks. Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Ridel, D., Rehder, E., Lauer, M., Stiller, C., and Wolf, D. (2018, January 4\u20137). A Literature Review on the Prediction of Pedestrian Behavior in Urban Scenarios. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA.","DOI":"10.1109\/ITSC.2018.8569415"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1109\/TPAMI.2017.2728788","article-title":"Learning and inferring \u201cdark matter\u201d and predicting human intents and trajectories in videos","volume":"40","author":"Xie","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Wei, P., Liu, Y., Shu, T., Zheng, N., and Zhu, S. Where and Why are They Looking?. Jointly Inferring Human Attention and Intentions in Complex Tasks. In Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18\u201323 June 2018.","DOI":"10.1109\/CVPR.2018.00711"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Liu, B., Adeli, E., Cao, Z., Lee, K.H., Shenoi, A., Gaidon, A., and Niebles, J.C. (2020). Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction. arXiv.","DOI":"10.1109\/LRA.2020.2976305"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Ranga, A., Giruzzi, F., Bhanushali, J., Wirbel, E., P\u00e9rez, P., Vu, T.H., and Perrotton, X. (2020). VRUNet: Multi-Task Learning Model for Intent Prediction of Vulnerable Road Users. arXiv.","DOI":"10.2352\/ISSN.2470-1173.2020.16.AVM-109"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1016\/0893-6080(91)90009-T","article-title":"Approximation capabilities of multilayer feedforward networks","volume":"4","author":"Hornik","year":"1991","journal-title":"Neural Netw."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Rehder, E., Kloeden, H., and Stiller, C. Head detection and orientation estimation for pedestrian safety. Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8\u201311 October 2014.","DOI":"10.1109\/ITSC.2014.6958057"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"K\u00f6hler, S., Goldhammer, M., Zindler, K., Doll, K., and Dietmeyer, K. Stereo-vision-based pedestrian\u2019s intention detection in a moving vehicle. Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Las Palmas, Spain, 15\u201318 September 2015.","DOI":"10.1109\/ITSC.2015.374"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"1872","DOI":"10.1109\/TITS.2014.2379441","article-title":"A probabilistic framework for joint pedestrian head and body orientation estimation","volume":"16","author":"Flohr","year":"2015","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Schulz, A.T., and Stiefelhagen, R. Pedestrian intention recognition using latent-dynamic conditional random fields. Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Korea, 28 June\u20131 July 2015.","DOI":"10.1109\/IVS.2015.7225754"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Rasouli, A., Kotseruba, I., and Tsotsos, J.K. (2018, January 4\u20137). Towards Social Autonomous Vehicles: Understanding Pedestrian-Driver Interactions. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA.","DOI":"10.1109\/ITSC.2018.8569324"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Dey, D., and Terken, J. (2017, January 24\u201327). Pedestrian interaction with vehicles: Roles of explicit and implicit communication. Proceedings of the 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Oldenburg, Germany.","DOI":"10.1145\/3122986.3123009"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Schneemann, F., and Heinemann, P. (2016, January 9\u201314). Context-based detection of pedestrian crossing intention for autonomous driving in urban environments. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea.","DOI":"10.1109\/IROS.2016.7759351"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Yang, F., Sakti, S., Wu, Y., and Nakamura, S. (2019). Make Skeleton-based Action Recognition Model Smaller, Faster and Better. arXiv.","DOI":"10.1145\/3338533.3366569"},{"key":"ref_74","unstructured":"Baradel, F., Wolf, C., and Mille, J. (2018, January 2\u20136). Human Activity Recognition with Pose-driven Attention to RGB. Proceedings of the BMVC 2018\u201429th British Machine Vision Conference, Newcastle, UK."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., and Wang, G. Spatio-temporal lstm with trust gates for 3d human action recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11\u201314 October 2016.","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Yang, Z., Li, Y., Yang, J., and Luo, J. (2018). Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences. arXiv.","DOI":"10.1109\/ICPR.2018.8546012"},{"key":"ref_77","unstructured":"Maas, A.L. (2020, December 09). Rectifier Nonlinearities Improve Neural Network Acoustic Models. Available online: https:\/\/ai.stanford.edu\/~amaas\/papers\/relu_hybrid_icml2013_final.pdf."},{"key":"ref_78","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_79","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv."},{"key":"ref_80","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_81","unstructured":"Smith, S.L., Kindermans, P.J., Ying, C., and Le, Q.V. (2017). Don\u2019t Decay the Learning Rate, Increase the Batch Size. arXiv."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Fang, H.S., Xie, S., Tai, Y.W., and Lu, C. (2016). RMPE: Regional Multi-person Pose Estimation. arXiv.","DOI":"10.1109\/ICCV.2017.256"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2016). Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_84","unstructured":"Chollet, F. (2020, December 09). Keras. Available online: https:\/\/keras.io."},{"key":"ref_85","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2020, December 09). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: tensorflow.org."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"201","DOI":"10.3758\/BF03212378","article-title":"Visual perception of biological motion and a model for its analysis","volume":"14","author":"Johansson","year":"1973","journal-title":"Percept. Psychophys."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1002\/mds.870070312","article-title":"Voluntary stimulus-sensitive jerks and jumps mimicking myoclonus or pathological startle syndromes","volume":"7","author":"Thompson","year":"1992","journal-title":"Mov. Disord. Off. J. Mov. Disord. Soc."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1037\/h0034147","article-title":"Reaction time of young and elderly subjects in relation to perceptual deprivation and signal-on versus signal-off conditions","volume":"8","author":"Kemp","year":"1973","journal-title":"Dev. Psychol."},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., and Wang, X. (2018). 3D Human Pose Estimation in the Wild by Adversarial Learning. arXiv.","DOI":"10.1109\/CVPR.2018.00551"},{"key":"ref_90","unstructured":"Xiu, Y., Li, J., Wang, H., Fang, Y., and Lu, C. (2018). Pose Flow: Efficient Online Pose Tracking. arXiv."},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Ning, G., and Huang, H. (2019). LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking. arXiv.","DOI":"10.1109\/CVPRW50498.2020.00525"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Xiao, B., Wu, H., and Wei, Y. (2018). Simple Baselines for Human Pose Estimation and Tracking. arXiv.","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Raaj, Y., Idrees, H., Hidalgo, G., and Sheikh, Y. (2019, January 15\u201320). Efficient Online Multi-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00475"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Fang, H.S., Xie, S., Tai, Y.W., and Lu, C. (2017, January 22\u201329). RMPE: Regional Multi-person Pose Estimation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.256"},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., and Murphy, K. (2017). Towards Accurate Multi-person Pose Estimation in the Wild. arXiv.","DOI":"10.1109\/CVPR.2017.395"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Iqbal, U., and Gall, J. (2016). Multi-Person Pose Estimation with Local Joint-to-Person Associations. arXiv.","DOI":"10.1007\/978-3-319-48881-3_44"},{"key":"ref_97","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. Simple online and realtime tracking with a deep association metric. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17\u201320 September 2017.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. (2016, January 25\u201328). Simple online and realtime tracking. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533003"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/12\/331\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:43:24Z","timestamp":1760179404000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/12\/331"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,10]]},"references-count":98,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["a13120331"],"URL":"https:\/\/doi.org\/10.3390\/a13120331","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,10]]}}}