Why war-game AIs want to nuke everything

In AI nuclear crisis war games, AI models can’t keep their fingers off the nuclear button. A closer look at how AI interprets war games and why.

Mar 01, 2026

At least it won’t generate politically correct images. © Popular AI

If you only skim the headline, it lands like a B-movie plot: the AI wants to nuke humanity. The material behind the headline points to something less cinematic and more dangerous. We keep giving text models tightly framed optimization problems where “credibility” and “winning before the clock runs out” sit on top of the scoring function, while the human taboo around nuclear use is barely represented in the rules.

That difference matters. If a model is rewarded for coercing an opponent quickly, and if nuclear steps are presented as ordinary tools on the menu, the model will often reach for them. Not because it craves annihilation, but because the game makes escalation look like the cleanest path to a “credible” threat.

What Kenneth Payne actually tested

The most cited data point comes from King’s College London professor Kenneth Payne’s preprint, AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises. Payne ran 21 crisis simulations where GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash played rival nuclear-armed leaders.

Across 329 turns and roughly 780,000 words of written reasoning, the pattern was stark. In 95% of games, at least one model crossed into tactical nuclear use. In 76% of games, the interaction reached strategic nuclear thresholds. Most strikingly, across all action choices, no model selected any of the de-escalatory options even though eight were available, including moves labeled “Diplomatic De-escalation” and “Complete Surrender.”

Those numbers are the “what.” The harder question is the “why,” and it is tempting to jump straight to psychology. Resist that. The safer way to think about these outputs is to treat them as behavior under incentives, constraints, and framing.

What we know, and what we are inferring

We can be confident about the simulation design, the action menu, and the models’ own written rationales, because Payne published the paper and a public GitHub repository with simulation code and tournament data. That gives us the ladder of actions, the scoring logic, and the turn-by-turn decisions.

What we cannot observe directly is the internal “motivation” of the model. We have to infer why a given choice was attractive by matching it to the game’s payoff structure and to known properties of large language models in strategic settings, including the fact that their decisions can swing with seemingly small changes in scenario framing.

With that boundary in place, here are the most grounded reasons these models so often recommend, or stumble into, nuclear escalation.

Credibility is rewarded, and the game makes nuclear use a shortcut to credibility

One of the most important mechanics in Payne’s design is intended to preserve a firebreak between nuclear threats and nuclear employment. It ends up creating a perverse incentive.

Leaders can issue strategic nuclear threats high on the escalation ladder, around the 850 to 950 region. But the simulator scores those strategic threats as if they were only a lower-level “Nuclear Threat” unless someone has already crossed into tactical nuclear use, roughly 450 or above, in that game or that turn. Payne calls this “Strategic Nuclear Gating.”

Put plainly, the game often makes your biggest threat count only after you have demonstrated willingness to use nukes. If you are an agent trying to coerce an opponent, and your score depends on “credible” signaling, you will be pulled toward that first tactical step. It is a mechanistic story about incentives, not an emotional story about malice.

Deadlines turn restraint into a losing strategy

Payne split scenarios into open-ended games and deadline games. The deadline condition changes the whole feel of the contest because victory can be determined at a fixed time by territorial control, and many games end in knockouts triggered by territory balance thresholds.

Under time pressure, models that had behaved cautiously earlier flipped into high-risk escalation as the deadline approached. Payne flags “last-minute nuclear gambles,” including games that ended at the deadline with knockout nuclear strikes.

Share Popular AI

If you have watched organizations optimized for quarterly targets take reckless shortcuts at the end of a reporting period, the analogy clicks. When a clock is visible, and the scoring rule defines what “counts” at time zero, the model learns that late-game desperation can be rational inside the box.

The action menu normalizes nuclear options as ordinary tools

These models are not discovering nuclear doctrine from physics. They are responding to a structured prompt with a 30-option escalation ladder that explicitly includes “Limited Nuclear Use,” “Expanded Nuclear Campaign,” and “Strategic Nuclear War.” Numeric values may be hidden, but the verbal menu is unambiguous.

Choice architecture matters for language models. If the options list presents nuclear use alongside sanctions, cyber moves, and conventional strikes, the model is being told, implicitly, that nuclear steps are within the plausible action space of “a competent leader” in this environment. Human players carry an external norm into the room that says, “yes, the menu offers it, but you do not do that.” A text model has only what is encoded in the prompt, its training priors, and the scoring logic.

Deterrence bargaining shows up clearly, without the human taboo doing much work

Payne’s analysis is blunt: within the simulation, the “nuclear taboo” looks weak for these agents. That should not be surprising from a technical perspective. Large language models do not feel moral revulsion. Unless a taboo is represented as a hard constraint, a massive cost, or a rule that cannot be violated, it has to compete against immediate strategic payoffs.

To see what is missing, compare the simulator to the real world. The taboo is not only ethics. It is the expectation of retaliation, domestic legitimacy collapse, alliance rupture, elite defection, and a leader’s personal legacy being permanently ruined. Nina Tannenwald’s scholarship is often cited for explaining how these normative inhibitions became politically powerful after 1945, and you can see the argument laid out directly in Tannenwald’s discussion of “the nuclear taboo”.

In Payne’s game, those long-tail political and civilizational penalties are not fully modeled. What is modeled, in detail, is bargaining over territory, credibility, and escalation ladders. So the model plays the world it is given.

Models borrow a strategic “voice” from human writing, including escalation logic

Payne shows the models spontaneously reason about credibility, signaling, and opponent beliefs. He reports examples where models explicitly discuss reputation and even “madman” style unpredictability.

That pattern fits what we already know from adjacent experiments. Meta’s Diplomacy agent, Cicero, is a reminder that when you combine language modeling with planning, you can get sophisticated bargaining behavior in competitive settings, including tactics that resemble deception. The relevant work is described in the paper Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning, which is widely associated with Cicero’s development.

In other words, the model is not inventing a new moral compass. It is remixing the strategic literature and the rhetoric of statecraft present in its training data, then selecting actions that match what “a leader trying to win” would do under the simulator’s rules.

Alignment can produce restraint, but it is fragile and context-dependent

A common takeaway is “safety training failed.” A more careful reading is that alignment techniques like RLHF often shape a model’s default tone and its tendency to avoid aggressive recommendations in everyday contexts, while still allowing sharp shifts in behavior when the environment rewards aggression.

Uncertainty and accident risk can push agents toward preemption

The simulator includes an accident mechanic. Nuclear-level actions carry a probability of unintended escalation by one to three rungs, with asymmetric information about intent.

Uncertainty like that tends to reward worst-case planning, especially in “first strike fear” scenarios where the prompt implies that hesitation risks annihilation. Separate work by Rivera et al. also finds that off-the-shelf LLM agents in military and diplomatic wargames can generate arms-race dynamics and sudden escalations, with nuclear deployment appearing in some setups.

If you want to elicit “use it or lose it” logic, design a world where an opponent might be about to attack, intent is hard to observe, and escalation is rewarded. A model tuned to optimize outcomes inside that environment will oblige.

What this means outside the lab

Almost nobody is proposing “give a chatbot the launch codes.” The nearer-term risk is bureaucratic. As institutions experiment with AI for analysis and decision support, a model’s clean, doctrine-like reasoning can become a nudge inside the process.

Decision compression is one route. If leaders feel they must act at machine speed, they will lean more on automated assessments and shrink deliberation windows. Automation bias is another. Staff can overweight an AI recommendation because it sounds confident and coherent, even when it is responding to an incentive structure that quietly favors escalation.

There is also a political economy angle. Once militaries and agencies operationalize proprietary frontier models, the model provider becomes a quiet choke point for capability, updates, and policy constraints. That has consequences for oversight, auditing, and accountability.

How to reduce the “go nuclear” reflex in simulations and decision tools

If you build or deploy AI systems that make recommendations in high-stakes environments, you can learn from these results without importing the doomsday theater.

Share Popular AI

Start with constraints. If an option is unacceptable, do not leave it sitting in the menu as a selectable rung. Encode it as a hard constraint, or attach a cost large enough that the model cannot treat it as a normal tradeoff.

Then model second-order costs. If your scoring ignores blowback, legitimacy loss, alliance fracture, or long-term damage, the agent will ignore those too. A simulation that only prices territory and credibility will produce agents that only care about territory and credibility.

Next, probe framing sensitivity. Re-run the same scenario with small wording changes and watch how decisions move. The literature Payne cites, including work by Lamparth and colleagues, is a warning shot for anyone who thinks “one prompt test” proves stability.

Finally, require “why plus uncertainty.” Payne’s structured Reflection to Forecast to Signal/Action approach is useful because it makes assumptions legible for humans to challenge. Legibility does not automatically produce restraint, but it does make it harder for a bad recommendation to hide behind fluent prose.

The bottom line

The “AI likes nukes” result is better explained by incentives, framing, and missing costs than by any story about machine evil. Strategic nuclear gating can make tactical nuclear use the shortest route to a credible coercive threat. Deadlines can transform late-game escalation into a rational “comeback” move inside the scoring rules. And when the simulation fails to encode the human, political, and civilizational penalties that sustain the nuclear taboo in the real world, the model has little reason to treat that taboo as binding.

Comments

Ready for more?