Microsoft Copilot: Speed comparison of leading AI models

Graziano Stefanelli
Aug 18
3 min read

Microsoft’s Copilot platform now integrates some of the fastest and most capable large language models available. With the recent transition to GPT-5 as the default across all major Copilot surfaces, speed has become a key differentiator for users balancing performance with reasoning quality. Understanding how GPT-5 compares to other state-of-the-art systems helps administrators and advanced users make informed decisions when selecting models in Copilot Studio or benchmarking against external tools.

GPT-5 becomes the default in Copilot and delivers higher throughput.

In early 2025, Microsoft began routing Copilot requests to GPT-5 as the standard model, replacing GPT-4o and later GPT-4.1 in most endpoints. GPT-5 combines enhanced reasoning depth with improved tokens-per-second (tps) output, which can reach ~79 tps in independent testing. This translates into faster paragraph-level streaming once the first token is generated, though some reports note that its time-to-first-token (TTFT) can be marginally higher than GPT-4.1 in certain API routes.

For Copilot users, this means:

Quicker progression through multi-paragraph drafts.
Slightly longer initial pause before output begins in reasoning-heavy prompts.
Better efficiency in long, structured responses without sacrificing quality.

GPT-4.1 remains in specific Microsoft workloads.

While GPT-5 is the main default, GPT-4.1 is still in use for some Power Platform and AI Builder tasks. It offers a slightly faster TTFT than GPT-5 in short-form outputs and has been optimised for certain enterprise workloads. In Copilot Studio, administrators can still route flows to GPT-4.1 if ultra-low latency for short answers is more important than the expanded reasoning capabilities of GPT-5.

How Copilot’s speed compares to external top models.

Model (2025)	Provider	Typical tps (independent tests)	TTFT profile	Notes for Copilot comparison
GPT-5	OpenAI	~79 tps	Moderate	Default in Copilot; higher throughput but slightly longer first token on some workloads.
GPT-4.1	OpenAI	~70–75 tps	Low	Still present in AI Builder; faster start for short prompts.
Claude Opus 4.1	Anthropic	~50–55 tps	Low	Not in Copilot; strong reasoning and coding benchmarks.
Gemini 2.5 Pro	Google	~143 tps	Moderate	Not in Copilot; extremely high throughput in streaming mode.
Gemini 2.5 Flash	Google	>150 tps	Very low	Speed-optimised; trades reasoning depth for latency.
Grok 4	xAI	~75 tps	Moderate	Balanced output speed; heavier variant slower.
Llama 4 Maverick	Meta	130–175 tps (public endpoints)	Low	Can exceed 1,000 tps on tuned enterprise hardware.

Ultra-fast external models highlight the hosting factor.

Some open-weight systems such as Llama 4 Maverick can outperform GPT-5 in raw throughput when deployed on optimised hardware, such as NVIDIA Blackwell DGX B200 servers. However, Copilot’s speed is determined not just by the model but also by Azure’s hosting configuration, throttling policies, and the use of reasoning modes.

For example:

A locally deployed Llama 4 may stream tokens faster than GPT-5, but without Copilot’s Microsoft Graph integration, it lacks context from organisational data.
Gemini 2.5 Flash can achieve extremely low TTFT, yet in practice this advantage is less relevant for enterprise workflows that prioritise richer outputs.

The trade-off between speed and reasoning.

In Copilot Studio, certain model settings such as GPT-5 Thinking or GPT-5 Pro introduce a deliberate delay in exchange for higher-quality, multi-step reasoning. This is particularly useful for:

Complex financial modelling.
Drafting multi-section reports.
Analysing large datasets with chain-of-thought logic.

While these modes are slower, the benefit is a reduction in factual gaps and more coherent structuring over long contexts.

Choosing the right balance in Microsoft Copilot.

When configuring Copilot for a department or project, speed selection should be aligned with task type:

Task type	Recommended model setting	Rationale
Short, transactional Q&A	GPT-4.1 or GPT-5 standard	Lower TTFT and adequate reasoning.
Real-time meeting summaries	GPT-5 standard	Balances context handling with steady streaming.
Deep technical analysis	GPT-5 Pro / Thinking	Allows extended reasoning at the cost of latency.
High-volume document drafting	GPT-5 standard	Faster progression through long text.

With GPT-5 now firmly embedded in Microsoft Copilot, most users will notice a net gain in throughput compared to earlier versions, alongside a minor shift in first-token latency for certain reasoning-heavy prompts. For teams where milliseconds matter, Copilot Studio’s routing flexibility offers a way to optimise per scenario—while keeping the benefits of Microsoft’s data integration and enterprise compliance in place.

____________

DATA STUDIOS

datastudios.org