What Does OpenAI's Operator Mean For Web Design?

By Pete Czech

p>There were two major stories in AI last week. First, DeepSeek (not linking to it out of suspicion) is garnering much attention for its performance relative to cost. Oh, and that it's from China. The news has sent the stock market collapsing today - is it possible that there is an AI engine that is optimized to use much fewer resources? We'll see about that...

But the news of last week that faded into obscurity too quickly is that OpenAI unveiled Operator. This AI agent automates online tasks like form completion, grocery ordering, and travel booking. While there are other similar projects available, such as Browser-Use, this one performs quite well and easily does various routine tasks. Best of all - anyone can use this now with zero development knowledge.

Powered by the Computer-Using Agent (CUA) model, Operator combines GPT-4's visual processing with enhanced reasoning to interact naturally with web interfaces through clicking, typing, and navigation. I'll share more details on how it works below for those who are curious.

This automation tool, currently available as a research preview to U.S. ChatGPT Pro subscribers, represents a significant advancement in AI-assisted web interaction. By handling routine digital tasks autonomously, Operator aims to streamline users' online experiences and reduce time spent on manual operations. And it does a pretty good job. Having tested other similar agents, including Claude's similar capability (which required some dev know-how), OpenAI exceeds expectations.

How It Works

Operator is an AI agent designed to perform web-based tasks autonomously on users' behalf. It interacts with web elements such as buttons, menus, and text fields, enabling it to handle tasks like filling out forms, ordering groceries, and creating memes. This capability allows users to delegate repetitive or time-consuming online activities to the AI, streamlining daily routines and enhancing productivity.

At its core, Operator utilizes a model known as the Computer-Using Agent (CUA), which combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. This integration enables Operator to interpret visual elements on web pages and interact with them effectively. In layman's terms, it grabs and analyzes screenshots and then employs virtual mouse and keyboard inputs. By doing this, Operator can autonomously navigate web interfaces to accomplish specified tasks.

This is not to be confused with the ability of LLMs to access text content and learn from it. In this instance, the LLM interprets what it is "seeing" via screenshots and makes logical decisions to move forward. This is an important differentiating factor. LLMs learn by absorbing text, which is why we keep discussing the need for enhanced website AI readiness. Operator approaches from another perspective – it grabs the visual of the page and then interprets based on that.

To summarize, it's doing something incredible: seeing, thinking, and taking action.

Currently, Operator is available as a research preview for ChatGPT Pro users in the U.S., meaning you'll need to commit $200 per month to utilize it. Despite the cost and early hiccups, Operator represents a significant advancement in AI-driven task automation. OpenAI is apparently also collaborating with companies like Instacart, Uber, and eBay to enhance Operator's capabilities and ensure seamless integration with various services.

While Operator is still in its early stages and may encounter challenges with complex web interfaces, it reflects the growing focus on AI agents that can autonomously perform tasks, potentially transforming how users interact with technology and manage daily activities. As development progresses, Operator aims to become a valuable tool for enhancing efficiency and productivity.

What You Can Do With It

OpenAI's release video featured an engineer who was prompted to find a recipe for his favorite meal and then have the Operator order the ingredients from Instacart. This is pretty amazing. However, it still requires user interaction and confirmation for many steps. There are now many videos circulating on YouTube with folks trying different combinations of tasks. Suffice it to say, anything you can do online Operator can try to do. It will stumble with more sophisticated applications, but it also gets pretty far along most processes you'll throw at it.

I already have programmed Operator to do a few things for me:

Tell me when new properties go for sale in my town – I'm always looking for a good investment.
It's looking at United Airlines to change my seat to a window for an upcoming trip.
It combs various websites for the value of certain assets I have, so I don't have to load them on a regular basis.

The one negative is that I have to start these tasks – they are not autonomous yet. Some tasks will rerun if you ask, but you have to keep the browser window open – which is somewhat inconvenient.

What You Can't Do With It

I tried quite a few scenarios to see what it can and can't do. As one test, for fun, I gave it my login to the USGA's GHIN service. I asked it to load up the list of players at my golf course and tell me the ten best players by handicap. I asked it to show me players whose handicaps are under 2, keeping in mind that handicaps can go on the other side of zero, denoted by a "+" symbol. Operator tried its best. It logged in, found the listing of players at the course, and then repeatedly asked it to "View More" to add more records to the page. However, it eventually gave up, asking for another, more structured way to find this data. So clearly, it is not good for web scraping or data gathering. It is meant to perform tasks for you.

As I said, you also can't have any Operator tasks running in the background. As of today, you have to use the user interface. I'm sure this will be fixed in short order, but for now, it limits some of the automation that this offers. If I had to guess, because this is an unlimited release, they still have to scale up the infrastructure to do these types of tasks. Once this is widely released, people will rely on it heavily, meaning there will be a need for additional stability in its operation. You can, however, save requests so you can easily start them later.

What Comes Next?

I think that Operator gives us an excellent idea of what the future of AI agents will be across various use cases and types of users. I suspect that in the next few years, we're going to be in a hybrid scenario where websites function more or less in a traditional fashion while website owners and operators spend much of their time optimizing their sites for interaction with these types of agents. I think in the near term, it will happen on the front end, optimizing your code to enhance your AI visibility. Then, it advances to the creation of more APIs and direct connection methods so agents can work with your product or service much quicker.

From the user perspective, once this system is allowed to run in the background and perform tasks for you regularly, we'll see mass adaptation of this or similar tools. I think almost anyone who works with a computer every day as part of their job will quickly be able to find 8 to 10 repetitive tasks that they do repeatedly. An agent who would complete those items would be a convenient assistant throughout their day.

I also think that when Open AI creates an API for Operator, that will unleash an entire slew of creative solutions being developed that utilize it. But I also think that will take a little while to get to, seeing how they have other products like Sora that still have no API either.

Implications For Web Design

As I indicated before, this system works uniquely. It takes screenshots and then interprets them. I'm not quite sure what this means just yet in terms of how to optimize for such a system. Theoretically, as long as the site statically presents itself, Operator will be able to function with it. I feel I should reinforce that Operator is not absorbing content and learning (that we know of). It's simply looking at pictures and then determining the subsequent actions. The content that it can scrape is limited. Therefore, it's imperative that website owners continue to optimize their site to make it easier for automated bots to read the content. As I said above, focusing on AI visibility via optimizations to your code is an essential project to undertake in the near term.

So, what are some possible implications? Here are a few that come to mind:

Shift in Design Priorities: Assume that a large portion of your users are now bots. Tasks should be clean, fast, and easy to ensure those bots work. Humans will also benefit from this change.
Optimize for Speed: I would assume that quicker load times and optimized code will result in better performance for these systems. I'd also assume that would end up being a ranking factor in terms of preference for AI interfaces.
Navigation: Get users (or bots) deeper, quicker. I saw Operator struggle with large data sets. I also saw it struggle with pagination. These challenges must be fixed.
The Rebirth of Site Search? For some time, I've been wondering what the value of search on a site is - but Operator uses it. More often than you would think.
Operator as a UI/UX Test? I've tried asking it to browse my website and find things - using Operator as a test user is a great way to see how effective your website is.
Rejuvination of "Desktop?": Operator uses a desktop experience to browse and utilize websites. Does this mean that desktop isn't dead, after all?

These are just a few focus areas for me in the coming months. Overall, we're in the first inning for this type of system, and time will tell what direction it takes as it evolves.

Wrapping Up

In conclusion, Operator represents an exciting step forward in AI's ability to automate routine digital tasks. While it's not without its limitations—such as needing user interaction to initiate tasks, its reliance on screenshots, and its challenges with complex data manipulation—it offers a glimpse into the future of AI-powered productivity tools. As the technology matures, we can expect more seamless automation, background task execution, and integration with APIs, ultimately revolutionizing how users interact with technology. For now, Operator is a powerful demonstration of what AI agents can achieve, and it challenges both users and developers to rethink how we design systems and workflows in an increasingly AI-driven world.

Get in Touch

In the past, we have addressed many of the important reasons to take website accessibility seriously.

Get In Touch