OpenAI Introduces Operator & Agents

OpenAI Introduces Operator & Agents

OpenAI Introduces Operator & Agents!

Here is everything you need to know:

Operator is a system that can use a web browser to accomplish tasks. Operator can look at a webpage and interact with it by typing, clicking, and scrolling.

It's available as a research preview. Available in the US for Pro users. Available to Plus users later.

Operator can perform a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes.

Here is an example where a user is asking Operator to book a table for two.

Article content

Operator instantiates a remote browser. The agent clicks around and interacts with the webpage to complete the task.

Article content

If Operator needs a location it can use the custom instructions to guide itself.

Article content

For critical actions, Operator asks the user for confirmation.

Article content

You can use Operator for shopping. Provide a shopping list as an image.

Article content

Operator is based on a model called Computer-using Agent (CUA). Combining GPT-4o's vision capabilities with advanced reasoning through RL, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen.

CUA interacts with screenshots, no APIs! It interacts with the browser with actions allowed by a mouse and keyboard. This removes the requirements for custom API integrations. Using inner monologue to decide what actions to take next based on screenshots.

Article content

You can also interact with Operator if you want to add additional instructions and then return the control. Operator can't see when you take over -- this interaction is private.

Article content

Below is an example for buying tickets or finding information about events.

Article content

You can also run tasks in parallel. If you don't specific website, Operator can just do browsing as well instead of going directly to apps/services.

Here are some details on the safety aspect of Operator.

Article content

It can refuse harmful tasks, avoid blocked websites, and prevent spam. Confirmation is a key mitigation strategy built into Operator. There is an interesting prompt injection monitor as an extra layer of security.

Here is a good example of when Operator requires the user to take control.

Article content

In this case, it's asking for an email address to continue with signing in. That's why that part of the interaction is kept private.

Here is the performance of CUA on the OSWorld and WebArena benchmarks. CUA performs better than previous SoTA but still has a long way to go when compared to human performance.

Article content

The OpenAI folks mentioned that the model will made available in the coming weeks. Sam ended the demo by saying that this is the beginning of their next step into agents (level 3 tier).

Here is the full demo:


Thanks for sharing the updates

Like
Reply

To view or add a comment, sign in

More articles by Elvis S.

  • My Favorite LLM Papers for October
    BERJAYA

    Here's a list of my favorite LLM papers I read this month: 1/ Zephyr LLM - a 7B parameter model with competitive…

    BERJAYA BERJAYA BERJAYA
    2 Comments
  • Tracking LLMs with Comet
    BERJAYA

    When building with LLMs, you will spend a lot of time optimizing prompts and diagnosing LLMs. As you put your solutions…

    BERJAYA BERJAYA BERJAYA
    3 Comments
  • How To Build a Custom Chat LLM on Your Data
    BERJAYA

    This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by…

    BERJAYA BERJAYA BERJAYA
    2 Comments
  • Data Exploration with Chat Powered by GPT-4
    BERJAYA

    As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful…

    BERJAYA BERJAYA BERJAYA
    6 Comments
  • Open Source Solution Replicates ChatGPT Training Process
    BERJAYA

    ChatGPT is the biggest buzz in AI today! ChatGPT demonstrates remarkable capabilities so there is a high interest to…

    BERJAYA BERJAYA BERJAYA
    7 Comments
  • New Conversational AI Tool Lets You “Chat” With Your Data
    BERJAYA

    As an ML engineer, one area where I spend a lot of time is data engineering. Can we use conversational AI technologies…

    BERJAYA BERJAYA BERJAYA
    8 Comments
  • Analyzing Worldwide Energy Production with Kibana Lens
    BERJAYA

    While there are many tools that can be used to perform a quick analysis of large-scale data, data analysis in itself is…

    BERJAYA
    1 Comment
  • XLNet outperforms BERT on several NLP Tasks
    BERJAYA

    Two pretraining objectives that have been successful for pretraining neural networks used in transfer learning NLP are…

    BERJAYA
    1 Comment

Others also viewed

Explore content categories