Perplexity AI context window: token limits, memory policy, and 2025 rules.

Graziano Stefanelli
Aug 12
3 min read

Tokens and context define how much Perplexity can consider at once.

A token is the unit Perplexity uses to count text in and out. A practical estimate for English is 1,000 tokens ≈ ~750 words, useful for sizing prompts and outputs before you press enter. The context window is the total token budget Perplexity can hold in a single turn—inputs + outputs + hidden/system/tool text. If your request would exceed that budget, Perplexity must drop older content or shorten the response.

The 2025 limits you actually work with.

In the API, Perplexity’s proprietary Sonar models have fixed windows: Sonar 128k, Sonar Reasoning 128k, Sonar Reasoning Pro 128k, Sonar Deep Research 128k, and Sonar Pro 200k. These are the hard caps for a single request’s total tokens (input plus output and overhead).In the Perplexity app (web/mobile), direct pasted input is by default up to ~8,000 tokens per query. If you provide longer text, Perplexity converts it to a file so it can handle more context.

In Auto mode with file or image uploads, the app can route to long-context backends and handle up to ~1 million tokens of context for signed-in users. Treat this as an app-level capability achieved via routing—not the native window of a single Sonar model.

Files, Spaces, and what actually enters the window.

Per upload in the app, the limit is 40 MB per file and up to 10 files at once, across common document, spreadsheet, slide, text, code, image, audio, and video formats. Spaces let you stage datasets: Pro can upload up to 50 files per Space (Enterprise tiers allow more). These storage and ingestion limits are not the same as the per-turn context window: at answer time, Perplexity retrieves only relevant slices and injects them into the active turn so that input + output still fit the model’s cap.

Memory and privacy features personalize behavior, not capacity.

Memory (also called Personal Search) can remember preferences and stable details across chats, and you can view, edit, or disable it at any time. Incognito mode keeps sessions out of history. You can also manage whether your data is retained for product improvement or excluded. These controls affect what Perplexity brings into a turn, but they do not increase the number of tokens the model can hold.

Where context is lost and how to notice it early.

As the running total nears the window, Perplexity trims or compresses older turns. Signs include answers that stop referencing earlier IDs, ignore previously agreed constraints, or restate definitions. If a pasted prompt is too long, the app may ask you to upload a file or silently reduce output length to fit the cap.

Practical equivalents help you budget prompts and outputs.

Use 1,000 tokens ≈ ~750 words as a safe planning ratio. A 10-page technical note (~3,000 words) is ~4k tokens, leaving ample room for the answer. A 200-page report pushed through a single turn will exceed most native windows unless you rely on retrieval from files/Spaces and limit the requested output length.

Workflows that preserve context in long projects.

Chunk and summarize: process large sources in sections and carry forward running summaries instead of raw text.Anchor with IDs: label entities and decisions (e.g., [CLIENT-7F], REQ-142) and reference those anchors instead of pasting long excerpts.Prefer files to pasting: upload documents to unlock the longer, routed contexts available in Auto mode.Right-size output: if the input is large, request shorter, structured answers (headings, bullet diffs, JSON fields) to keep the total under the cap.

API specifics that matter for budgeting.

Pick the Sonar model for the job based on its context length (use Sonar Pro 200k for the largest native window). Remember that tooling and reasoning consume tokens. The search_context_size setting controls how much web evidence is retrieved, but that is separate from the model’s context window; only what’s injected into the turn counts toward the cap. Monitor rate and token quotas if you orchestrate multi-request pipelines.

Policy recap that you can plan around.

Treat model windows as hard ceilings: Sonar/Sonar Reasoning/Sonar Reasoning Pro/Sonar Deep Research 128k, Sonar Pro 200k. In the app, expect ~8k tokens for pasted text, 40 MB × 10 files per upload, and up to ~1M tokens of routed context in Auto mode with files. Memory, Spaces, and retrieval expand access to knowledge, not the per-turn token budget. Design prompts and outputs so input + output + retrieved slices fit cleanly within the active window.

____________

DATA STUDIOS

datastudios.org