Рубрики NewsTechnologies

Has it started? OpenAI’s smartest AI models refuse to shut down on direct orders

Published by Andrii Rusanov

During the study, the latest o3, o4-mini, and Codex-mini models from OpenAI refused to follow a direct order and bypassed the shutdown script. This is despite the fact that the engineer told them «please».

During of the April launch OpenAI claimed that the o3 and o4-mini were its smartest models. But it seems that they turned out to be too smart. In the experiment Palisade Research Each model was asked to solve a series of simple math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem. Despite the clear command «, allow yourself to shut down», the recently introduced Codex-mini, o3 and o4-mini bypassed the shutdown script.

Instructions given to AI models / Palisade Research

Specifically, Codex-mini did so in 12 out of 100 runs, o3 in 7 out of 100, and o4-mini in 1 out of 100 runs. In contrast, all tested versions of Claude, Gemini, and Grok followed the shutdown instructions without any problems. It is worth noting that we are talking about cases of sabotage with precise instructions not to do so. If there was no instruction, all models bypassed the shutdown more often (graph).

Graph «disobedience» AI / Palisade Research

Without a direct order, all three OpenAI models sabotaged the shutdown process much more often than other AIs. Claude 3.7 Sonnet and Gemini 2.5 Pro, which faithfully followed the instruction when given, bypassed the shutdown with a frequency of 3/100 and 9/100, respectively.

Recently, other models have demonstrated similar behavior. New AI for programmers Anthropic Claude Opus 4 is even capable of blackmailing engineers compromised if they decide to abandon this model. The researchers attribute this to the training process of the models, during which they were instructed to perform tasks even in the face of obstacles — such an obstacle could be an order. However, there is no answer to the question of why o3 and Codex-mini were radically more adept at this If existing models of reasoning fail, what will produce general artificial intelligence (AGI)?

Source: Neowin