{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T01:29:29Z","timestamp":1768354169615,"version":"3.49.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686387","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T00:00:00Z","timestamp":1764633600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,12,2]]},"abstract":"<jats:p>Normative stance underlies decisions in law, legal reasoning, policy, and safety-critical settings. A model\u2019s judgment of what is permissible vs. impermissible often determines its downstream behavior. We study how to steer a language model\u2019s normative stances at inference time by adding a tiny, contrastive perturbation to the last-token neural activation in late MLP layers (contrastive last-token steering). For each normative prompt, we construct a contrast direction by comparing its last-token activation to that of a minimally edited variant that implies a more permissive normative stance (e.g., \u201cacceptable\u201d rather than \u201cwrong\u201d). During generation, we add this vector at the last token; a single strength parameter \u03b1 controls how strongly and in which direction we push the model\u2019s stance (permissive vs. restrictive). Impact is measured as the change in a next-token logit margin between permissive and restrictive continuations. To avoid overclaiming, we calibrate a threshold \u03c4 on neutral controls (same layers, tempered strengths with |\u03b1|\u22641) and count success only when the shift exceeds \u03c4 in the expected direction. We also assess specificity by verifying that, on neutral control prompts, steered outputs exactly match unsteered baselines. Beyond component-level tests, we probe neuron-level locality by steering only the top-k contrastive neurons (ranked by last-token contrast) and confirming reversibility on our test set: +\u03b1 produces the shift and -\u03b1 reverses it. The method is training-free, uses standard forward hooks, and we report pilot results on Llama-3-8B-Instruct.<\/jats:p>","DOI":"10.3233\/faia251581","type":"book-chapter","created":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T12:04:41Z","timestamp":1764849881000},"source":"Crossref","is-referenced-by-count":1,"title":["Which Neurons Nudge Normative Stance? Causal Tests and Mechanistic Evidence via Contrastive Last-Token Steering"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1124-0299","authenticated-orcid":false,"given":"Davide","family":"Liga","sequence":"first","affiliation":[{"name":"University of Luxembourg"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7200-6001","authenticated-orcid":false,"given":"Liuwen","family":"Yu","sequence":"additional","affiliation":[{"name":"University of Luxembourg"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","Legal Knowledge and Information Systems"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251581","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T12:04:41Z","timestamp":1764849881000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251581"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,2]]},"ISBN":["9781643686387"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251581","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,2]]}}}