ChatGPT Agent is now widely available: a critical examination of its capabilities and limitations in 2025

Graziano Stefanelli
Aug 5
8 min read

OpenAI’s autonomous assistant enters the mainstream, but its actual performance is defined by a mixture of cautious automation, real-world obstacles, and the need for persistent human oversight.

With the August 2025 expansion, ChatGPT Agent has finally reached most Plus, Team, and Pro subscribers, ushering in a new phase of practical AI autonomy. The promise is straightforward but ambitious: allow ChatGPT to go beyond static Q&A, transforming it into a hands-on, browser-based agent capable of acting directly on the open web. In theory, this should mean not just information retrieval, but genuine task completion—automating sequences like form submissions, data extraction, itinerary planning, and much more.

However, beneath this technological leap lies a complex and still-immature reality. While Agent marks a new frontier in generative AI, its daily use reveals a landscape shaped by technical limitations, strict safety protocols, and a developmental curve that often makes human supervision not just recommended, but absolutely necessary. The following sections dissect, in granular detail, what ChatGPT Agent can actually accomplish, where it falters, and how its operational style is shaping early user sentiment and industry expectations.

ChatGPT Agent brings action and automation, but under strict procedural logic

The Agent’s capacity to interact with the web is built on a system of granular, sequential actions that attempt to replicate the step-by-step logic of a cautious human user.

At the heart of ChatGPT Agent is a transition from passive, information-based chat to active, procedural workflow automation. Rather than simply suggesting steps or providing instructions, the Agent is designed to orchestrate browser actions directly. This is made possible by OpenAI’s integration of a secure headless browser, tightly linked to the GPT-4o model, which interprets user prompts as high-level objectives and then decomposes them into a set of concrete, ordered steps.

Every action is mediated by DOM parsing and real-time feedback. The Agent reads and interprets the structure of a webpage, identifying interactive elements such as buttons, links, text fields, drop-downs, and checkboxes. It can scroll through content, make selections, input data, and navigate through multi-step workflows, always displaying its reasoning and progress in a transparent log visible to the user.

The user is never fully out of the loop: at any point, it’s possible to pause the process, redirect the Agent, or stop it altogether. In practice, this keeps the user as the ultimate decision-maker, with the Agent functioning as an obedient and methodical executor of commands. The paradigm is less about replacing human agency, and more about amplifying productivity through the careful delegation of routine, repeatable actions.

Practical strengths: linear, well-defined, and low-friction tasks are where Agent excels

The system performs reliably in contexts that minimize unpredictability and complexity, favoring scenarios that mirror the flow of classic web automation scripts.

User feedback and testing from the past week converge on a clear pattern: ChatGPT Agent is effective when assigned tasks that fit linear, deterministic workflows—where each step is logically connected, the outcome is predictable, and the website itself is not actively defending against bots or automated interactions.

Examples abound in real-world usage:

Gathering tabular or structured data from open-access directories, such as real estate listings, university course catalogs, or public schedules.
Completing registration forms, booking requests, newsletter sign-ups, or customer contact pages where authentication is light or absent.
Drafting emails or simple reports and then submitting them through static web forms.
Generating and aggregating travel itineraries by pulling from official city or airline portals, piecing together data from multiple sources into a unified summary.
Navigating documentation archives, public regulatory sites, or static informational pages to extract key policy, compliance, or market data.

The critical success factor is the absence of dynamic content barriers. The Agent is most reliable when the target site is statically rendered, minimally interactive, and free from traps like popup modals, advanced JavaScript widgets, or ever-changing page layouts.

In these optimal cases, Agent essentially serves as a tireless, fastidious assistant—carrying out repetitive click-and-type actions, copy-pasting information, and summarizing results without the tedium of manual effort. This creates real value for professionals who regularly face large volumes of routine web work, especially in research, administration, or entry-level data analysis.

Performance bottlenecks: the Agent’s speed and error rate limit ambitious automation

Despite the promise of autonomy, the real-world Agent is marked by deliberateness, latency, and a need for continuous correction, especially as complexity increases.

A recurring theme in user reports is the slowness with which the Agent completes even basic multi-step workflows. The Agent’s operational rhythm is methodical to a fault: before clicking or typing, it parses the page, cross-checks the elements, waits for loads, verifies that its intended action matches what’s visible, and only then proceeds. This process is intentionally conservative, with OpenAI prioritizing safety and predictability over speed. The result, however, is that even “simple” tasks—such as filling a three-page form or navigating across a handful of tabs—can take from one to ten minutes.

The performance overhead grows exponentially with complexity:

Multi-page navigation with stateful forms leads to frequent stalls if an unexpected modal or timeout appears.
Any workflow involving asynchronous content loading (such as infinite scrolling, live updates, or dynamically populated dropdowns) risks breaking the action chain, forcing manual intervention.
When interacting with poorly labeled or overlapping UI elements, the Agent’s selection logic may fail, leading to incorrect clicks, duplicated entries, or skipped steps.
Data extraction, while generally robust for tables and lists, can become unreliable with complex nested layouts or embedded content (e.g., iframes, carousels, or expandable sections).

What this means for users is a workflow that demands vigilance. The Agent requires regular supervision, both to recover from predictable hiccups and to ensure that the final output aligns with the user’s real intent. Most professionals using Agent today treat it less as a set-and-forget automation tool, and more as a “co-pilot” whose work must be reviewed and, frequently, corrected.

Structural blockers: login pages, paywalls, CAPTCHA, and aggressive anti-bot infrastructure

The Agent’s capabilities are fundamentally restricted by both technical and ethical constraints, leading to systematic failure on protected or commercial platforms.

A major limitation—often encountered in day-to-day use—is the Agent’s inability to function on sites that employ even basic anti-bot measures or require complex authentication. OpenAI has pre-emptively disabled Agent access for many high-traffic domains, especially those with a history of blocking automation.

In practical terms, the Agent is blocked or non-functional when:

The site explicitly prohibits bot access in its robots.txt file (notably, Amazon, LinkedIn, and major social platforms).
Login workflows involve multi-factor authentication, SSO redirects, CAPTCHA challenges, or OAuth grants.
The site UI is built with technologies that obscure traditional web elements, such as HTML canvas rendering, React/Angular SPAs, or heavy client-side scripting.
Persistent sessions are required (for example, shopping carts or personalized dashboards) that cannot be initialized without user credentials.
Aggressive anti-scraping or anti-automation logic detects bot-like behavior and triggers IP blocks, rate limits, or honeypot traps.

Real-world user attempts to automate product lookups or purchases on Amazon, ticket searches on event platforms, or job applications on professional networks almost universally end in error messages, stuck sessions, or silent failures. This is not just a technical limitation—it reflects a deliberate ethical and legal compliance strategy by OpenAI to avoid misuse and litigation.

Even on semi-open platforms, the Agent’s inability to persist authentication cookies or manage session tokens means it cannot reliably perform tasks that, for a human, would be trivial after login. Any workflow requiring even moderate identity persistence or multi-step secure access is, for now, out of reach.

Safety, privacy, and oversight: OpenAI enforces strong boundaries, but human vigilance remains essential

Every Agent session is sandboxed and monitored, with transparent logging and enforced isolation—yet practical privacy and security risks still demand active user management.

OpenAI has implemented a robust array of guardrails around the Agent’s operation, designed to protect both user data and the wider ecosystem from accidental or malicious actions. These include:

Domain-level access controls, with ongoing updates to permitted and blacklisted sites.
Session isolation, ensuring that each Agent run is separated from prior activity, with no persistent cookies or credentials carried across tasks.
Full activity logging, displaying every interaction, decision point, and web element action in a visible transcript—users can see every click, keystroke, and navigation event in real time.
Strict prohibition of file downloads, uploads, or direct local device interaction, with all actions confined to the browser environment controlled by OpenAI.
Automatic session termination if the Agent encounters forbidden domains, ambiguous navigation flows, or appears to be stuck.

Despite these protections, practical risk remains—especially for users who provide broad or ambiguous instructions, or who attempt to connect Agent with other workflow tools or APIs. Experts in cybersecurity have pointed out the latent danger of prompt injection attacks, data exfiltration via poorly crafted tasks, or accidental leaks of sensitive information through complex chained actions.

Within enterprise settings, this means that any deployment of Agent requires not just technical integration, but clear organizational policies, user training, and regular monitoring. The transparency of the action log is a significant advantage, but it is no substitute for responsible human oversight.

OpenAI’s roadmap: evolving features, dynamic constraints, and operational caveats for the near future

The company’s public communications point to an Agent that will change rapidly—expanding capabilities, adjusting limitations, and refining its operational style based on real-world usage feedback.

OpenAI’s most recent official updates (as of August 4, 2025) underscore the provisional nature of the Agent. Users are explicitly warned that:

Performance may degrade or become temporarily unavailable as demand spikes and backend infrastructure is adjusted.
Some site categories and features may be enabled or disabled dynamically, in response to misuse, user complaints, or observed reliability issues.
Usage rate limits are likely to be enforced more strictly as more users activate Agent, particularly for resource-intensive or repetitive tasks.
New features are in development, including customizable Agent behavior profiles (trading off speed versus safety), and a planned “Observer Mode” that would allow the Agent to watch and learn from user-initiated browsing before automating similar tasks.
Integration with advanced memory systems and more sophisticated context management is on the roadmap, but no concrete timeline is given.

The company has also acknowledged, candidly, that hiccups, mistakes, and service interruptions are to be expected as the Agent matures. OpenAI positions this as a phase of “learning in public,” with iterative improvements shaped by user experience data and the evolving technical landscape.

The current reality: useful for narrow tasks, dependent on hands-on management, and not (yet) a true digital employee

ChatGPT Agent, in its present state, acts as a cautious but helpful assistant for routine, linear web work, but falls short of seamless, independent automation.

For professionals and power users, the Agent is best understood as a cooperative automation tool—a way to save time on well-bounded, repetitive online workflows, but not a magic button for unattended task completion. Its strengths lie in its transparency, its careful handling of web actions, and its potential to free up users from drudge work.

Yet its dependency on human guidance, its fragility when confronted with unexpected complexity, and its many technical and policy-driven boundaries make it clear that ChatGPT Agent is not a substitute for dedicated browser automation frameworks, RPA tools, or human assistants—at least not yet.

Most user feedback reflects a mix of excitement and frustration: the tool’s potential is obvious, but so are its weaknesses. It works best when the user is willing to invest time supervising, correcting, and iteratively refining their requests. In other words, it is more like a new junior hire than an autonomous digital colleague: eager, helpful, and willing to learn, but in need of patient oversight at every step.

_________

DATA STUDIOS

datastudios.org