{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T16:06:59Z","timestamp":1772813219153,"version":"3.50.1"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686547","type":"electronic"}],"license":[{"start":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T00:00:00Z","timestamp":1772582400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,3,4]]},"abstract":"<jats:p>This paper presents a unified deep learning framework for multi-class detection and tracking of visually similar road users in complex road environments. The proposed system is designed to accurately distinguish and track various road entities\u2014such as cars, buses, trucks, bicycles, motorcycles, pedestrians, and riders\u2014which often exhibit similar visual features. To achieve robust detection and instance-aware tracking, we adopt a three-stage training strategy: (1) a supervised multi-task learning stage for joint object detection and class identification, (2) a self-supervised contrastive learning stage for extracting instance-level feature embeddings without identity labels, and (3) a fine-tuning stage to improve object identification accuracy by refining both the detection and feature embedding heads. The unified network simultaneously outputs object classes, locations, and appearance embeddings, enabling consistent identity association across video frames. Experimental results demonstrate that the proposed method outperforms existing approaches in both fine-grained detection and multi-object tracking accuracy.<\/jats:p>","DOI":"10.3233\/faia260017","type":"book-chapter","created":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T10:20:43Z","timestamp":1772792443000},"source":"Crossref","is-referenced-by-count":0,"title":["Multi-Stage Learning for Visually Similar Road User Detection and Tracking"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-5229-360X","authenticated-orcid":false,"given":"Young Chul","family":"Lim","sequence":"first","affiliation":[{"name":"Division of Mobility Technology, Daegu Gyeongbuk Institute of Science & Technology, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4018-8514","authenticated-orcid":false,"given":"Minsung","family":"Kang","sequence":"additional","affiliation":[{"name":"Division of Mobility Technology, Daegu Gyeongbuk Institute of Science & Technology, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","Machine Learning and Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA260017","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T10:20:43Z","timestamp":1772792443000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA260017"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,4]]},"ISBN":["9781643686547"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia260017","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,4]]}}}