OpenAI
OpenAI officially introduced AI agent Operator — a browser automation tool that can interact with screen elements (buttons, text fields, etc.) almost as a human would.
Operator uses a new artificial intelligence model named Computer-Using Agent (CUA), which enables computer control through a visual interface (actually combining GPT-4 capabilities with image recognition and an updated reasoning mechanism). Actions occur in several stages: initially, the agent captures screenshots, analyzes them, and determines what actions to perform — eventually, it executes clicking, scrolling, or typing through mouse and keyboard simulation.
While working, Operator users will see all these actions in a miniature browser window.
Certainly, the technology is relatively new and far from perfect. Currently, the agent performs best with repetitive tasks (such as creating shopping lists or playlists), but slightly “lags” on unfamiliar interfaces (tables or calendars) and in editing complex texts.
OpenAI notes that it has incorporated several safety controls into Operator, which require user confirmation before performing confidential actions, such as sending emails or making purchases. The tool also has restrictions on what it can view—mainly concerning adult or gambling sites.
As of today, Operator is available in a preview under the subscription of ChatGPT Pro for $200 (only in the USA), but “eventually” OpenAI will add the tool for Plus, Team, and Enterprise rate holders. Also planned is the integration of Operator directly into ChatGPT and releasing CUA through an API for developers.
The preview will allow OpenAI to gather feedback on Operator and improve the system for further work.
OpenAI is not the only company that promotes “agent” AI systems. Last December, Google announced Project Mariner, which performs automated tasks through the Chrome browser, and two months earlier, a similar system was launched by Anthropic.