{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T18:43:25Z","timestamp":1779389005061,"version":"3.53.1"},"reference-count":52,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,6,14]],"date-time":"2022-06-14T00:00:00Z","timestamp":1655164800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100019998","name":"Suicide Prevention Research Fund Innovation Grant","doi-asserted-by":"publisher","award":["APP1152952"],"award-info":[{"award-number":["APP1152952"]}],"id":[{"id":"10.13039\/501100019998","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Understanding human behaviours through video analysis has seen significant research progress in recent years with the advancement of deep learning. This topic is of great importance to the next generation of intelligent visual surveillance systems which are capable of real-time detection and analysis of human behaviours. One important application is to automatically monitor and detect individuals who are in crisis at suicide hotspots to facilitate early intervention and prevention. However, there is still a significant gap between research in human action recognition and visual video processing in general, and their application to monitor hotspots for suicide prevention. While complex backgrounds, non-rigid movements of pedestrians and limitations of surveillance cameras and multi-task requirements for a surveillance system all pose challenges to the development of such systems, a further challenge is the detection of crisis behaviours before a suicide attempt is made, and there is a paucity of datasets in this area due to privacy and confidentiality issues. Most relevant research only applies to detecting suicides such as hangings or jumps from bridges, providing no potential for early prevention. In this research, these problems are addressed by proposing a new modular design for an intelligent visual processing pipeline that is capable of pedestrian detection, tracking, pose estimation and recognition of both normal actions and high risk behavioural cues that are important indicators of a suicide attempt. Specifically, based on the key finding that human body gestures can be used for the detection of social signals that potentially precede a suicide attempt, a new 2D skeleton-based action recognition algorithm is proposed. By using a two-branch network that takes advantage of three types of skeleton-based features extracted from a sequence of frames and a stacked LSTM structure, the model predicts the action label at each time step. It achieved good performance on both the public dataset JHMDB and a smaller private CCTV footage collection on action recognition. Moreover, a logical layer, which uses knowledge from a human coding study to recognise pre-suicide behaviour indicators, has been built on top of the action recognition module to compensate for the small dataset size. It enables complex behaviour patterns to be recognised even from smaller datasets. The whole pipeline has been tested in a real-world application of suicide prevention using simulated footage from a surveillance system installed at a suicide hotspot, and preliminary results confirm its effectiveness at capturing crisis behaviour indicators for early detection and prevention of suicide.<\/jats:p>","DOI":"10.3390\/s22124488","type":"journal-article","created":{"date-parts":[[2022,6,15]],"date-time":"2022-06-15T01:39:54Z","timestamp":1655257194000},"page":"4488","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Towards Building a Visual Behaviour Analysis Pipeline for Suicide Detection and Prevention"],"prefix":"10.3390","volume":"22","author":[{"given":"Xun","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales, Kensington, NSW 2052, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sandersan","family":"Onie","sequence":"additional","affiliation":[{"name":"Black Dog Institute, University of New South Wales, Randwick, NSW 2031, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Morgan","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales, Kensington, NSW 2052, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mark","family":"Larsen","sequence":"additional","affiliation":[{"name":"Black Dog Institute, University of New South Wales, Randwick, NSW 2031, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Arcot","family":"Sowmya","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales, Kensington, NSW 2052, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Revathi, A.R., and Kumar, D. (2012). A Survey Of Activity Recognition Additionally, Understanding The Behavior In Video Survelliance. arXiv.","DOI":"10.5121\/csit.2012.2337"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, T., Sun, Z., and Chen, X. (2020, January 12\u201315). Group-Skeleton-Based Human Action Recognition in Complex Events. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3416280"},{"key":"ref_3","first-page":"251","article-title":"Deep learning for behaviour recognition in surveillance applications","volume":"Volume 11166","author":"Bouma","year":"2019","journal-title":"Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies III"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sultani, W., Chen, C., and Shah, M. (2018, January 18\u201323). Real-World Anomaly Detection in Surveillance Videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00678"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Larsen, M.E., Cummins, N., Boonstra, T.W., O\u2019Dea, B., Tighe, J., Nicholas, J., Shand, F., Epps, J., and Christensen, H. (2015, January 25\u201329). The use of technology in Suicide Prevention. Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy.","DOI":"10.1109\/EMBC.2015.7320081"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Benton, A., Mitchell, M., and Hovy, D. (2017). Multi-Task Learning for Mental Health using Social Media Text. arXiv.","DOI":"10.18653\/v1\/E17-1015"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1076","DOI":"10.1155\/2018\/6157249","article-title":"Supervised learning for suicidal ideation detection in online user content","volume":"2018","author":"Ji","year":"2018","journal-title":"Complexity"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-25773-2","article-title":"Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing","volume":"8","author":"Fernandes","year":"2018","journal-title":"Sci. Rep."},{"key":"ref_9","first-page":"994","article-title":"Interventions to reduce suicides at suicide hotspots: A systematic review and meta-analysis","volume":"2","author":"Pirkis","year":"2015","journal-title":"Lancet"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12889-016-3888-x","article-title":"Can CCTV identify people in public transit stations who are at risk of attempting suicide? An analysis of CCTV video recordings of attempters and a comparative investigation","volume":"16","author":"Mishara","year":"2016","journal-title":"BMC Public Health"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Reid, S., Coleman, S., Kerr, D., Vance, P., and O\u2019Neill, S. (2018, January 18\u201321). Feature Extraction with Computational Intelligence for Head Pose Estimation. Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India.","DOI":"10.1109\/SSCI.2018.8628622"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e27663","DOI":"10.2196\/27663","article-title":"The Use of Closed-Circuit Television and Video in Suicide Prevention: Narrative Review and Future Directions","volume":"8","author":"Onie","year":"2021","journal-title":"JMIR Ment. Health"},{"key":"ref_13","unstructured":"Lin, W. (2011). A Survey on Behavior Analysis in Video Surveillance Applications. Video Surveill. IntechOpen, 281\u2013291."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kim, S., Yun, K., Park, J., and Choi, J.Y. (2019). Skeleton-based Action Recognition of People Handling Objects. arXiv.","DOI":"10.1109\/WACV.2019.00014"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"14557","DOI":"10.1007\/s11042-015-3134-z","article-title":"Application of Sensor Network System to Prevent Suicide from the Bridge","volume":"75","author":"Lee","year":"2016","journal-title":"Multimed. Tools Appl."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bouachir, W., and Noumeir, R. (2016, January 23\u201325). Automated video surveillance for preventing suicide attempts. Proceedings of the 7th International Conference on Imaging for Crime Detection and Prevention Automated Video Surveillance for Preventing Suicide Attempts, Madrid, Spain.","DOI":"10.1049\/ic.2016.0081"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2934","DOI":"10.1109\/JSEN.2014.2332070","article-title":"Detection of a Suicide by Hanging Based on a 3-D Image Analysis","volume":"14","author":"Lee","year":"2014","journal-title":"IEEE Sens. J."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"e021076","DOI":"10.1136\/bmjopen-2017-021076","article-title":"Behaviours preceding suicides at railway and underground locations: A multimethodological qualitative approach","volume":"8","author":"Mackenzie","year":"2018","journal-title":"BMJ Open"},{"key":"ref_19","unstructured":"Simonyan, K., and Zisserman, A. (2014). Two-Stream Convolutional Networks for Action Recognition in Videos, MIT Press. NIPS\u201914."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhou, B., Andonian, A., Oliva, A., and Torralba, A. (2018, January 8\u201314). Temporal Relational Reasoning in Videos. Proceedings of the European Conference on Computer Vision, Munich, Germnay.","DOI":"10.1007\/978-3-030-01246-5_49"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., and Mei, T. (2017, January 22\u201329). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.590"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Crasto, N., Weinzaepfel, P., Alahari, K., and Schmid, C. (2019, January 15\u201320). MARS: Motion-Augmented RGB Stream for Action Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00807"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., and Darrell, T. (2015, January 7\u201312). Long-term Recurrent Convolutional Networks for Visual Recognition and Description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"ref_25","unstructured":"Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., and Toderici, G. (2015). Beyond Short Snippets: Deep Networks for Video Classification. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"201","DOI":"10.3758\/BF03212378","article-title":"Visual perception of biological motion and a model for its analysis","volume":"14","author":"Johansson","year":"1973","journal-title":"Percept. Psychophys."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yao, A., Gall, J., Fanelli, G., and Gool, L.V. (2011, January 7\u201310). Does Human Action Recognition Benefit from Pose Estimation?. Proceedings of the British Machine Vision Conference, Swansea, UK.","DOI":"10.5244\/C.25.67"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Jhuang, H., Gall, J., Zuffi, S., Schmid, C., and Black, M.J. (2013, January 1\u20138). Towards understanding action recognition. Proceedings of the International Conf. on Computer Vision (ICCV), Sydeny, NSW, Australia.","DOI":"10.1109\/ICCV.2013.396"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s11263-012-0594-8","article-title":"Dense trajectories and motion boundary descriptors for action recognition","volume":"103","author":"Wang","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.1109\/TVCG.2010.272","article-title":"Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor","volume":"17","author":"Chen","year":"2010","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yang, F., Wu, Y., Sakti, S., and Nakamura, S. (2019). Make Skeleton-based Action Recognition Model Smaller, Faster and Better. arXiv.","DOI":"10.1145\/3338533.3366569"},{"key":"ref_32","unstructured":"De Smedt, Q., Wannous, H., Vandeborre, J.P., Guerry, J., Saux, B.L., and Filliat, D. (2017, January 23\u201324). 3D Hand Gesture Recognition Using a Depth and Skeletal Dataset: SHREC\u201917 Track. Proceedings of the Workshop on 3D Object Retrieval, Lyon, France. 3Dor \u201917."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Choutas, V., Weinzaepfel, P., Revaud, J., and Schmid, C. (2018, January 18\u201323). PoTion: Pose MoTion Representation for Action Recognition. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00734"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Duan, H., Zhao, Y., Chen, K., Shao, D., Lin, D., and Dai, B. (2021). Revisiting skeleton-based action recognition. arXiv.","DOI":"10.1109\/CVPR52688.2022.00298"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Ludl, D., Gulde, T., and Curio, C. (July, January 30). Simple yet Efficient Real-Time Pose-Based Action Recognition. Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand.","DOI":"10.1109\/ITSC.2019.8917128"},{"key":"ref_36","unstructured":"Du, Y., Wang, W., and Wang, L. (2015, January 7\u201312). Hierarchical recurrent neural network for skeleton based action recognition. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., and Xie, X. (2016, January 12\u201317). Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA. AAAI\u201916.","DOI":"10.1609\/aaai.v30i1.10451"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, S., Liu, X., and Xiao, J. (2017, January 24\u201331). On geometric features for skeleton-based action recognition using multilayer lstm networks. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.24"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.eswa.2018.02.013","article-title":"Surveillance scene representation and trajectory abnormality detection using aggregation of multiple concepts","volume":"101","author":"Ahmed","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_41","unstructured":"Jocher, G. (2021, February 01). Ultralytics\/yolov5. Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the Computer Vision\u2014ECCV 2014, 13th European Conference, Part IV2014, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2017, January 17\u201320). Simple online and realtime tracking with a deep association metric. Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"G\u00fcler, R.A., Neverova, N., and Kokkinos, I. (2018, January 13\u201323). Densepose: Dense human pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00762"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1109\/TPAMI.2019.2929257","article-title":"OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields","volume":"43","author":"Cao","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 16\u201320). Deep High-Resolution Representation Learning for Human Pose Estimation. Proceedings of the Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B. (2014, January 23\u201328). 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. Proceedings of the Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.471"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1109\/72.279181","article-title":"Learning long-term dependencies with gradient descent is difficult","volume":"5","author":"Bengio","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_49","unstructured":"Li, C., Wang, P., Wang, S., Hou, Y., and Li, W. (2017, January 12\u201314). Skeleton-based action recognition using LSTM and CNN. Proceedings of the International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zolfaghari, M., Oliveira, G., Sedaghat, N., and Brox, T. (2017). Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection. arXiv.","DOI":"10.1109\/ICCV.2017.316"},{"key":"ref_51","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). Slowfast networks for video recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Duan, H., Wang, J., Chen, K., and Lin, D. (2022). PYSKL: Towards Good Practices for Skeleton Action Recognition. arXiv.","DOI":"10.1145\/3503161.3548546"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/12\/4488\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:30:56Z","timestamp":1760139056000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/12\/4488"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,14]]},"references-count":52,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["s22124488"],"URL":"https:\/\/doi.org\/10.3390\/s22124488","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,14]]}}}