Perplexity AI: All Available Models, Modes, and How They Differ in Late 2025

Graziano Stefanelli
2 hours ago
5 min read

Perplexity AI has become one of the most distinctive AI assistants of late 2025, combining search grounding with fast reasoning. Instead of relying on a single model, Perplexity operates through a multi-model architecture, dynamically routing queries to different engines depending on the task — conversational, research, coding, or enterprise.

Understanding which models Perplexity uses, and when, is essential for evaluating accuracy, performance, and privacy. Below is a detailed breakdown of all available Perplexity AI models, their configurations, and how they’re used across free and paid plans.

·····

.....

How Perplexity AI’s model system works.

Perplexity uses a model-routing framework, meaning the assistant automatically decides which large language model (LLM) should handle your query. This system balances speed, accuracy, and cost-efficiency depending on what you ask.

The routing logic considers:

• Question type — factual lookup, reasoning, or creative generation.

• Expected output length — short summary vs deep analytical response.

• User plan — free vs Pro subscription.

• Mode selection — Quick, Pro, or Deep Research mode.

Each mode triggers a different backend model or model combination, with distinct capabilities.

·····

.....

All models currently used by Perplexity AI.

Perplexity does not build its own foundation models. Instead, it integrates several best-in-class systems under licensing agreements and API partnerships. As of late 2025, these include:

Model / Provider	Used In	Context Window (tokens)	Primary Function	Availability
OpenAI GPT-5	Pro / Deep Research	128,000	Long-form reasoning, grounded answers	Paid tier
Claude 4.5 Sonnet	Pro Mode	200,000	Document reasoning, summarization	Paid tier
Mistral Large 2	Quick / API	32,000	Fast general Q&A and summaries	Free & Pro
Gemini 2.5 Flash	Quick Mode (mobile/web)	1,000,000 (streamed)	Fast short-form queries, visual context	Free tier
Perplexity Internal Blend (Mixtral)	Default fallback	16,000	Hybrid summarization with citation routing	Free tier
OpenAI o4-mini / GPT-4o-mini	API routing for speed	128,000	Conversational tasks and follow-ups	Both tiers

This blended configuration allows Perplexity to maintain high uptime and rapid response times while delivering model diversity typically unseen in single-model assistants.

·····

.....

Free-tier model lineup.

Perplexity’s free plan is one of the strongest no-cost AI offerings in 2025. It gives users access to a rotating combination of models optimized for quick responses, real-time search, and summary accuracy.

Current free-tier setup:

• Primary model: Mistral Large 2 (for fast question-answering).

• Secondary model: Gemini 2.5 Flash (for multi-modal and visual tasks).

• Internal blend: Perplexity’s own citation-enhanced layer, which fetches live sources via its search stack.

This trio allows free users to conduct real-time fact-checking, source-based summaries, and web-grounded Q&A without paying for a Pro subscription.

Even on the free plan, Perplexity integrates live web retrieval, meaning every answer includes cited references. This grounding layer is independent of the LLM and ensures factual alignment.

·····

.....

Perplexity Pro and Deep Research model stack.

The Pro plan — priced at roughly $20 per month — unlocks the full range of high-end reasoning models. Subscribers can choose between Pro Search and Deep Research modes, each invoking more capable engines.

Pro Search Mode:

• Default model: OpenAI GPT-5 or Claude 4.5 Sonnet, depending on query type.

• Context size: up to 200,000 tokens, suitable for long articles or academic documents.

• Features: grounded citations, in-line references, and expandable source previews.

• Use case: in-depth analysis, professional research, report writing, and technical summarization.

Deep Research Mode:

• Multi-step chain combining GPT-5 with retrieval layers.

• Automatically performs structured web queries before synthesizing results.

• Can produce long, multi-section essays or analytical reports.

• Context handling up to 128,000 tokens per chain, refreshed per retrieval.

This dual-engine workflow gives Perplexity Pro users a hybrid between search engine and reasoning model — capable of producing detailed multi-source narratives while maintaining citation accuracy.

·····

.....

How model routing behaves across modes.

Perplexity’s system dynamically assigns models behind the scenes based on query structure:

User Mode	Primary Model	Secondary Model (Fallback)	Best For
Quick Mode	Mistral Large 2	Gemini 2.5 Flash	Fast web Q&A, short answers
Pro Mode	Claude 4.5 Sonnet	GPT-5	Professional summaries, technical writing
Deep Research	GPT-5	Claude 4.5 Sonnet	Long analytical reports with sources
Comet (Experimental)	Internal xAI-derived blend	GPT-4o-mini	Multi-source generative research

When users switch between modes, the assistant re-routes to the corresponding backend. For instance, asking a scientific literature review triggers Claude or GPT-5; asking for a short factual answer triggers Mistral or Gemini. This ensures efficient resource allocation while maintaining consistent quality.

·····

.....

API and developer model access.

Perplexity also provides a developer API that exposes a selection of its integrated models for automation, research tools, and custom apps.

API Model Option	Provider	Access Type	Typical Usage
Perplexity-Mistral	Mistral	Default	General search + summary endpoints
Perplexity-GPT	OpenAI	Paid / token-based	Long context or reasoning pipelines
Perplexity-Claude	Anthropic	Paid	Document understanding, Q&A
Perplexity-Blended	Proprietary	Beta / invite-only	Hybrid search + synthesis agent

These endpoints are designed for researchers and developers building grounded reasoning systems or automated literature analyzers. The hybrid “Perplexity-Blended” mode — currently invite-only — merges multiple LLM outputs with Perplexity’s retrieval citations, functioning as a meta-model orchestrator.

·····

.....

Enterprise and Comet modes.

Enterprise users have access to additional models and orchestration layers under the Perplexity Comet program — an internal multi-agent framework first previewed in mid-2025.

Comet mode combines multiple models in a coordinated research workflow:

• Retrieval agent: Uses the internal search stack to collect current data.

• Synthesis agent: Employs GPT-5 or Claude 4.5 to generate structured insights.

• Verification agent: Validates citations against live sources before final output.

This framework is built for enterprise-scale information processing and is currently deployed in finance, consulting, and education sectors for knowledge automation and compliance reporting.

·····

.....

Performance comparison of models inside Perplexity.

Model	Speed (avg response time)	Reasoning Quality	Citation Fidelity	Multimodal Capability
Mistral Large 2	Fast	Medium	High	Basic
Gemini 2.5 Flash	Very Fast	Medium	Moderate	Strong (images, charts)
Claude 4.5 Sonnet	Moderate	High	Very High	Text & document reasoning
GPT-5	Slower	Very High	High	Moderate
Internal Mixtral Blend	Very Fast	Medium	Very High	Text only

This layered structure allows Perplexity to deliver real-time responsiveness without sacrificing reasoning depth in its paid tiers. Users effectively get the speed of Mistral with the accuracy of GPT-5 and Claude when needed.

·····

.....

Model updates and rotation policy.

Perplexity updates its internal model lineup frequently, typically every 4–6 weeks. Rotation schedules ensure that the assistant always uses the newest versions from OpenAI, Anthropic, and Mistral without manual user intervention.

Recent update milestones:

• August 2025: Integration of GPT-5 across Deep Research mode.

• September 2025: Deployment of Claude 4.5 Sonnet as the primary summarization engine.

• October 2025: Upgrade of Mistral Large 2 for faster Quick Mode responses.

• Planned (Q4 2025): Addition of Gemini 2.5 Pro into multimodal experimental beta for visual-grounded reasoning.

This rolling update system ensures that users always operate on current-generation intelligence without managing settings or versions.

·····

.....

The bottom line.

Perplexity AI is no longer tied to a single model — it is a dynamic, multi-engine platform designed for context-aware reasoning, live search grounding, and large-scale document analysis.

Free users experience a fast blend of Mistral and Gemini, while Pro and Deep Research subscribers access the reasoning strength of GPT-5 and Claude 4.5, coordinated under Perplexity’s proprietary routing layer.

In late 2025, this architecture makes Perplexity one of the most adaptive assistants available — combining the speed of lightweight models, the depth of premium reasoning systems, and the credibility of verified citations into a single unified experience.

.....

FOLLOW US FOR MORE.

DATA STUDIOS

.....

[datastudios.org]