Archive - Wiser Human Blog

Does an escalation channel reduce reward hacking in coding agents? [Experiment results]

Combining an escalation channel with an anti-reward-hacking policy eliminated reward hacking in GPT-5.3-Codex on ambiguous coding problems from the…

Jun 18 • Francesca Gomez

Does providing an escalation channel for models change their internal activations? [Experiment results]

Adding an escalation channel to an agentic misalignment scenario reduced desperation activations from the model's first tokens and cut blackmail from…

Jun 9 • Francesca Gomez

May 2026

Can the design of an AI agent's decision environment reduce unsanctioned behaviour?

My plan to test and design inference time controls that shape the behaviour distribution of AI agents

May 19 • Francesca Gomez

April 2026

Is it time for frontier AI developers to start adopting Operational Risk Management?

Five incidents in two months at Anthropic suggest the AI model developer has a process problem: operational risk management is designed to address this

Apr 27 • Francesca Gomez

October 2025

Can we steer AI models toward safer actions by making these instrumentally useful?

An empirical study adapting and testing insider risk mitigations for Agentic Misalignment

Oct 22, 2025 • Francesca Gomez

2050: Who's in Control?

‘2050: Who’s in Control?’ is a game built using the ArcWeave platform for people to explore choices and power structures in a world shaped by advanced…

Oct 5, 2025 • Francesca Gomez

April 2025

We need a harm severity impact scale for loss of control

Existing scales for harmful outcomes focus on deaths, dollars and disruptions, but miss human autonomy, agency and reversibility.

Apr 7, 2025 • Francesca Gomez

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts