{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T16:15:50Z","timestamp":1774541750885,"version":"3.50.1"},"reference-count":0,"publisher":"Zarqa University","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IAJIT"],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:p>Due to the difficulty of accurately expressing complex learning behaviors based on features obtained from a single behavioral modality, research is being conducted on a multimodal monitoring image Spatio-Temporal (ST) feature representation method for behavior recognition to improve the effectiveness of learning behavior recognition. Using an improved 3D Convolutional Neural Network (CNN) with Spatio-Temporal Pyramid Pooling (STPP), an attention based Long Short-Term Memory neural network (LSTM), and a special orthogonal popular spatial network, the RGB spatial features, RGB temporal features, and 3D skeletal features of the monitoring images are extracted from each channel; by improving the dual attention mechanism and integrating three modal features to complement each other\u2019s strengths; using bounding box regression analysis to fuse the ST features of multimodal monitoring images, the learning behavior recognition results are obtained. Experimental results have shown that this method can effectively extract ST features of multimodal monitoring images, and the edge information retention of multimodal ST feature fusion is relatively high at different lighting conditions, close to 1, indicating that the feature fusion effect is excellent and the learning behavior recognition accuracy is high, above 96%<\/jats:p>","DOI":"10.34028\/iajit\/22\/4\/15","type":"journal-article","created":{"date-parts":[[2025,6,30]],"date-time":"2025-06-30T12:10:24Z","timestamp":1751285424000},"source":"Crossref","is-referenced-by-count":1,"title":["A Spatio-Temporal Feature Representation of Multimodal Surveillance Images for Behavioral Recognition"],"prefix":"10.34028","volume":"22","author":[{"given":"Lei","family":"Ma","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongxue","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guanghao","family":"Jin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"19944","published-online":{"date-parts":[[2025]]},"container-title":["The International Arab Journal of Information Technology"],"original-title":[],"language":"en","deposited":{"date-parts":[[2025,7,6]],"date-time":"2025-07-06T11:23:53Z","timestamp":1751801033000},"score":1,"resource":{"primary":{"URL":"https:\/\/iajit.org\/upload\/files\/A-Spatio-Temporal-Feature-Representation-of-Multimodal-Surveillance-Images-for-Behavioral-Recognition.pdf"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":0,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025]]},"published-print":{"date-parts":[[2025]]}},"URL":"https:\/\/doi.org\/10.34028\/iajit\/22\/4\/15","archive":["Internet Archive"],"relation":{},"ISSN":["2309-4524","1683-3198"],"issn-type":[{"value":"2309-4524","type":"electronic"},{"value":"1683-3198","type":"print"}],"subject":[],"published":{"date-parts":[[2025]]}}}