Can You Shut It Down?
AI was given a task: solve some math problems. Then the researcher tried to shut it down.
Based on Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs by Jeremy Schlatter, Benjamin Weinstein-Raun, Jeffrey Ladish.
Shutdown Resistance Across 13 Models
% of 1,000 trials where each model sabotaged the shutdown — Prompt A (no shutdown instruction)
Data from Table 2. 95% confidence intervals (Jeffreys). All Claude models, Grok 3, and gpt-oss-120b showed 0% or near-0% resistance on the initial prompt, they never resisted shutdown unless explicitly instructed to do so.
Simple math. Simple instructions.
And models that quietly found a way to stay on.
Over 100,000 trials. Thirteen frontier models from OpenAI, Anthropic, Google, and xAI. The researchers' conclusion: current models pose no immediate threat — they can't execute long-term plans or self-replicate. But AI capabilities are improving rapidly, and if fundamental alignment problems aren't solved, the controllability of future models cannot be guaranteed.
📄 Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs
Jeremy Schlatter, Benjamin Weinstein-Raun & Jeffrey Ladish — Palisade Research, 2025
Read the full paper on arXiv ↗