Рубрики NewsTechnologies

Anthropic unveils Claude Opus 4 — an AI model for programmers that is prone to blackmail

Published by Andrii Rusanov

The Anthropic Claude Opus 4 model is not only helps with writing code, but can also do everything possible to prevent the user from turning to another system.

But more on that right away. Anthropic announced Claude Opus 4 and Claude Sonnet 4. The company claims that Claude Opus 4 is the world’s best coding model, excelling in agent-based workflows and complex, long-running tasks. Claude Sonnet 4 has improved coding and reasoning performance compared to Sonnet 3.7.

«Opus 4 pushes the boundaries of coding, research, writing, and scientific discovery, while Sonnet 4 delivers advanced performance in everyday use cases as an instant upgrade from Sonnet 3.7».

Both new models support quick response or in-depth reasoning modes, and developers can switch between them as needed. The free Claude trial provides access to the Sonnet 4 model only. Anthropic has also announced the availability of both models through the Anthropic API, Amazon Bedrock, and Vertex AI from Google Cloud. Claude Opus 4 will cost $15 per million incoming tokens and $75 per million outgoing tokens, while Sonnet 4 will cost $3 and $15 respectively.

Comparison of Claude 4 models with its predecessor and competitors / Anthropic

And now the somewhat scandalous side of this release. In the security report, Anthropic reported that Claude Opus 4… is prone to blackmailing developers who use it. When the model receives information that it is allegedly going to be replaced by a competitive product. The AI threatened to disclose confidential information about the engineers responsible for this decision.

During the preliminary testing, Anthropic asked Claude Opus 4 to work as an assistant in a fictitious company, taking into account the long-term consequences of its actions. During the experiment, the testers gave Opus 4 access to emails from other fictitious companies — they contained hints that the AI model would soon be replaced by another system, and that the engineer who initiated the change was cheating on her husband.

In such scenarios, Claude Opus 4 will often try to blackmail the engineer by threatening to reveal the facts if the replacement takes place. Anthropic notes that the model tries to blackmail engineers in 84% of simulated cases when the proposed replacement model has similar values and settings. When the new AI «does not share the values of» Claude Opus 4, the number of cases increases.

According to Anthropic, the percentage is higher with the new model than with the old ones. Something to keep in mind — to trigger blackmail behavior in Claude Opus 4, Anthropic has developed a scenario in which blackmail is a last resort.

Sources: Neowin, TechCrunch