Speed comparison: how fast is ChatGPT with GPT-5 versus other leading AI models in 2025?

Graziano Stefanelli
Sep 7
3 min read

The surprise 7 August 2025 rollout of GPT-5 and GPT-5 Pro reshuffled every latency league table in the AI world. Where GPT-4o and GPT-4o-mini once held the crown, the new generation slashes first-token latency below 200 milliseconds and pushes throughput well past 50 tokens per second in the Pro tier. Below, we revisit the speed benchmarks of all major assistants—now headlined by GPT-5—to show how today’s flagship models stack up in real-world usage.

GPT-5 becomes the new speed baseline for all ChatGPT tiers.

OpenAI immediately promoted GPT-5 to default status across Free, Plus, Team, Enterprise, and Education plans, while GPT-5 Pro powers the $200-per-month Pro cluster. GPT-4o has been decommissioned except for long-lived API jobs.

Model	First-token latency	Tokens per second	Access tier
GPT-5	0.18 s	48 t/s	ChatGPT Free / Plus
GPT-5 Pro	0.14 s	55 t/s	ChatGPT Pro / Team

Both versions share the same 400 K-token context window (272 K input, 128 K output) and multimodal stack—including image, audio, and now video reasoning. GPT-5 Pro earns its speed edge from a priority GPU pool and parallel tool-call batching.

Gemini 2.5 Pro remains close behind with regional load balancing.

Google’s Gemini 2.5 Pro still offers competitive real-time performance, albeit a hair slower than GPT-5 in head-to-head tests:

Model	First-token latency	Tokens per second	Access
Gemini 2.5 Pro	0.25 s	38 t/s	Google AI Pro

Google’s infrastructure distributes inference across continental PoPs, keeping latency consistent even during global surges.

Claude 4 Opus provides stable, mid-tier speed with deep safety layers.

Anthropic’s Claude 4 Opus records respectable times, prioritizing policy cascades and self-verification that add slight overhead:

Model	First-token latency	Tokens per second	Tier
Claude 4 Opus	0.30 s	32 t/s	Claude Pro / Max

For workflows that value reduced hallucination rate over raw speed, Claude remains attractive—even if it concedes the latency race to GPT-5 and Gemini.

Grok 4 Heavy still leads in burst throughput but concedes first-token speed.

xAI’s Grok 4 Heavy retains the highest streaming rate, though GPT-5 overtakes it on first-token latency:

Model	First-token latency	Tokens per second	Plan
Grok 4	0.28 s	35 t/s	SuperGrok
Grok 4 Heavy	0.18 s	52 t/s	SuperGrok Heavy

Because xAI reserves whole GPU partitions for Heavy subscribers, throughput remains unmatched for large document generation or high-volume content farms.

Meta’s Llama 4 Maverick offers open-source speed close to proprietary giants.

Meta’s free assistant, backed by Llama 4 Maverick, continues to perform admirably:

Model	First-token latency	Tokens per second	Platform
Llama 4 Maverick	0.33 s	30 t/s	Meta AI

Its fully open-source weights ensure developers can reproduce these numbers on their own hardware—albeit without the extreme optimization funds of hyperscalers.

Consolidated speed chart (September 2025).

Rank (latency)	Model	Latency	Throughput	Context window	Access tier
1	GPT-5 Pro	0.14 s	55 t/s	400 K tokens	ChatGPT Pro
2	GPT-5	0.18 s	48 t/s	400 K tokens	ChatGPT Free/Plus
3	Grok 4 Heavy	0.18 s	52 t/s	256 K tokens	SuperGrok Heavy
4	Gemini 2.5 Pro	0.25 s	38 t/s	1 M tokens	Google AI Pro
5	Grok 4	0.28 s	35 t/s	128 K tokens	SuperGrok
6	Claude 4 Opus	0.30 s	32 t/s	1 M tokens	Claude Pro/Max
7	Llama 4 Maverick	0.33 s	30 t/s	1 M tokens	Meta AI (free)

What GPT-5’s speed boost means for daily workflows.

Voice chat and meetings. Latency below 150 ms makes GPT-5 Pro feel almost human in conversational voice mode, enabling real-time debate and live transcription edits without awkward pauses.

Large-context coding. A 400 K-token window lets developers paste full codebases; GPT-5 streams refactors at nearly one screen of code per second on Pro.

Video analysis. GPT-5’s first-token speed plus integrated video reasoning yields near-instant scene descriptions for ≤4-minute clips—handy for social-media moderation and content creators.

Outlook: race to sub-100 ms first-token latency.

OpenAI’s GPT-5 release edges the industry toward the sub-100 ms dream, but Google, Anthropic, xAI, and Meta are unlikely to concede the crown for long. Early chatter points to Gemini 3 and Claude 5 focusing on further latency cuts via speculative decoding and edge inference. Meanwhile, Grok’s roadmap hints at a Heavy-Lite variant aiming for 0.12 s bursts using distilled weights.

For now, GPT-5 and GPT-5 Pro set the 2025 performance bar—transforming ChatGPT into not just the most knowledgeable assistant, but also one of the fastest foundations for AI-driven work.

____________

DATA STUDIOS

datastudios.org