Speed comparison: how fast is ChatGPT with GPT-5 versus other leading AI models in 2025?
- Graziano Stefanelli
- 13 hours ago
- 3 min read

The surprise 7 August 2025 rollout of GPT-5 and GPT-5 Pro reshuffled every latency league table in the AI world. Where GPT-4o and GPT-4o-mini once held the crown, the new generation slashes first-token latency below 200 milliseconds and pushes throughput well past 50 tokens per second in the Pro tier. Below, we revisit the speed benchmarks of all major assistants—now headlined by GPT-5—to show how today’s flagship models stack up in real-world usage.
GPT-5 becomes the new speed baseline for all ChatGPT tiers.
OpenAI immediately promoted GPT-5 to default status across Free, Plus, Team, Enterprise, and Education plans, while GPT-5 Pro powers the $200-per-month Pro cluster. GPT-4o has been decommissioned except for long-lived API jobs.
Model | First-token latency | Tokens per second | Access tier |
GPT-5 | 0.18 s | 48 t/s | ChatGPT Free / Plus |
GPT-5 Pro | 0.14 s | 55 t/s | ChatGPT Pro / Team |
Both versions share the same 400 K-token context window (272 K input, 128 K output) and multimodal stack—including image, audio, and now video reasoning. GPT-5 Pro earns its speed edge from a priority GPU pool and parallel tool-call batching.
Gemini 2.5 Pro remains close behind with regional load balancing.
Google’s Gemini 2.5 Pro still offers competitive real-time performance, albeit a hair slower than GPT-5 in head-to-head tests:
Model | First-token latency | Tokens per second | Access |
Gemini 2.5 Pro | 0.25 s | 38 t/s | Google AI Pro |
Google’s infrastructure distributes inference across continental PoPs, keeping latency consistent even during global surges.
Claude 4 Opus provides stable, mid-tier speed with deep safety layers.
Anthropic’s Claude 4 Opus records respectable times, prioritizing policy cascades and self-verification that add slight overhead:
Model | First-token latency | Tokens per second | Tier |
Claude 4 Opus | 0.30 s | 32 t/s | Claude Pro / Max |
For workflows that value reduced hallucination rate over raw speed, Claude remains attractive—even if it concedes the latency race to GPT-5 and Gemini.
Grok 4 Heavy still leads in burst throughput but concedes first-token speed.
xAI’s Grok 4 Heavy retains the highest streaming rate, though GPT-5 overtakes it on first-token latency:
Model | First-token latency | Tokens per second | Plan |
Grok 4 | 0.28 s | 35 t/s | SuperGrok |
Grok 4 Heavy | 0.18 s | 52 t/s | SuperGrok Heavy |
Because xAI reserves whole GPU partitions for Heavy subscribers, throughput remains unmatched for large document generation or high-volume content farms.
Meta’s Llama 4 Maverick offers open-source speed close to proprietary giants.
Meta’s free assistant, backed by Llama 4 Maverick, continues to perform admirably:
Model | First-token latency | Tokens per second | Platform |
Llama 4 Maverick | 0.33 s | 30 t/s | Meta AI |
Its fully open-source weights ensure developers can reproduce these numbers on their own hardware—albeit without the extreme optimization funds of hyperscalers.
Consolidated speed chart (September 2025).
Rank (latency) | Model | Latency | Throughput | Context window | Access tier |
1 | GPT-5 Pro | 0.14 s | 55 t/s | 400 K tokens | ChatGPT Pro |
2 | GPT-5 | 0.18 s | 48 t/s | 400 K tokens | ChatGPT Free/Plus |
3 | Grok 4 Heavy | 0.18 s | 52 t/s | 256 K tokens | SuperGrok Heavy |
4 | Gemini 2.5 Pro | 0.25 s | 38 t/s | 1 M tokens | Google AI Pro |
5 | Grok 4 | 0.28 s | 35 t/s | 128 K tokens | SuperGrok |
6 | Claude 4 Opus | 0.30 s | 32 t/s | 1 M tokens | Claude Pro/Max |
7 | Llama 4 Maverick | 0.33 s | 30 t/s | 1 M tokens | Meta AI (free) |
What GPT-5’s speed boost means for daily workflows.
Voice chat and meetings. Latency below 150 ms makes GPT-5 Pro feel almost human in conversational voice mode, enabling real-time debate and live transcription edits without awkward pauses.
Large-context coding. A 400 K-token window lets developers paste full codebases; GPT-5 streams refactors at nearly one screen of code per second on Pro.
Video analysis. GPT-5’s first-token speed plus integrated video reasoning yields near-instant scene descriptions for ≤4-minute clips—handy for social-media moderation and content creators.
Outlook: race to sub-100 ms first-token latency.
OpenAI’s GPT-5 release edges the industry toward the sub-100 ms dream, but Google, Anthropic, xAI, and Meta are unlikely to concede the crown for long. Early chatter points to Gemini 3 and Claude 5 focusing on further latency cuts via speculative decoding and edge inference. Meanwhile, Grok’s roadmap hints at a Heavy-Lite variant aiming for 0.12 s bursts using distilled weights.
For now, GPT-5 and GPT-5 Pro set the 2025 performance bar—transforming ChatGPT into not just the most knowledgeable assistant, but also one of the fastest foundations for AI-driven work.
____________
FOLLOW US FOR MORE.
DATA STUDIOS