Wiser Human Blog
Subscribe
Sign in
Home
About
Does providing an escalation channel for models change their internal activations? [Experiment results]
Adding an escalation channel to an agentic misalignment scenario reduced desperation activations from the model's first tokens and cut blackmail from…
Jun 9
•
Francesca Gomez
May 2026
Can the design of an AI agent's decision environment reduce unsanctioned behaviour?
My plan to test and design inference time controls that shape the behaviour distribution of AI agents
May 19
•
Francesca Gomez
1
April 2026
Is it time for frontier AI developers to start adopting Operational Risk Management?
Five incidents in two months at Anthropic suggest the AI model developer has a process problem: operational risk management is designed to address this
Apr 27
•
Francesca Gomez
2
1
October 2025
Can we steer AI models toward safer actions by making these instrumentally useful?
An empirical study adapting and testing insider risk mitigations for Agentic Misalignment
Oct 22, 2025
•
Francesca Gomez
3
2
2050: Who's in Control?
‘2050: Who’s in Control?’ is a game built using the ArcWeave platform for people to explore choices and power structures in a world shaped by advanced…
Oct 5, 2025
•
Francesca Gomez
1
April 2025
We need a harm severity impact scale for loss of control
Existing scales for harmful outcomes focus on deaths, dollars and disruptions, but miss human autonomy, agency and reversibility.
Apr 7, 2025
•
Francesca Gomez
1
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts