Claude context window: token limits, memory policy, and 2025 rules.

Graziano Stefanelli
Aug 11
4 min read

Tokens and context are fixed quantities that define how much Claude can consider.

A token is a unit of text Claude uses to count and process inputs and outputs. In English, 1,000 tokens typically equals about 750 words or 2–3 pages of text. The context window is the total number of tokens Claude can retain at once—including prompts, replies, system instructions, tool invocations, and reasoning steps. Once the limit is reached, Claude must forget or trim parts of the conversation to make room for the next step.

Claude’s token window is shared between input, output, and internal reasoning.

Every interaction in Claude—via API or claude.ai—uses a fixed context window shared between the user’s input, the assistant’s reply, and any internal logic the model applies to solve a problem. While you may only see the visible input and output, Claude internally consumes tokens for interpreting instructions, summarizing previous turns, and invoking tools or search steps. Staying within budget requires balancing length across all parts.

The 2025 token limits for Claude’s most used models.

Claude Sonnet 4 and Claude Opus 4.1, the default models in the Claude app and API, offer a 200,000-token context window, with up to 64,000 output tokens for Sonnet 4 in standard conditions.On Claude Enterprise, the context window extends to 500,000 tokens per conversation when using Sonnet 4.

In the API, Claude 3.7 Sonnet can produce up to 128,000 output tokens when a specific header (anthropic-beta: output-128k-2025-02-19) is set—this option is not available on Claude 4 models.Free-tier users on claude.ai may have lower output ceilings and variable performance, especially during high-load periods.

File uploads and knowledge projects are constrained by the context window.

Users can upload up to 20 files per chat, each up to 30 MB, and Claude can reference them in real time. In “Projects,” Anthropic allows unlimited file storage, but Claude can still only pull up to 200k or 500k tokens into any single answer, depending on the plan. For example, if you load five PDFs into a Project, Claude will retrieve only the most relevant sections up to the active token window limit.

In the Claude app, PDFs up to 100 pages may be visually analyzed; longer files fall back to text-only parsing.

Claude’s memory only lasts for one chat—unless you use personalization or projects.

Claude’s base memory is stateless across chats: each session starts clean. Within one conversation, Claude remembers previous turns until the token cap is reached.Anthropic adds project memory and preference settings (e.g., tone, formality, custom instructions), but these features help guide Claude’s behavior—they do not increase the context capacity.

In coding mode, Claude uses structured memory files like CLAUDE.md to carry instructions between functions. These files are loaded into the working memory and count toward the token total.

When Claude loses context, it’s usually because the window was exceeded.

Every turn in a Claude chat consumes tokens. If your chat grows too long, Claude may stop referring to earlier ideas, repeat definitions, or ignore previous instructions. This is a sign that the conversation has outgrown the window.Claude trims old turns to make room, and the model prioritizes recent inputs. If you need Claude to “remember” something, it must be either summarized and restated within the latest turns, or stored in a project file that can be retrieved as needed.

Projects and retrieval features extend access to larger knowledge, not memory.

Anthropic’s Projects feature enables Claude to search through a much larger base of uploaded content—well beyond what fits in the active context. Only the most relevant parts are pulled into the reply. Claude does not memorize these files in the human sense; it performs retrieval-based reasoning, which gives access to more content without raising the token ceiling.This makes Projects ideal for long-term knowledge storage, legal documents, or multi-step workflows where only fragments are needed at each step.

Enterprise plans extend the window but follow the same constraints.

Claude for Work (Team/Enterprise) offers the 500k-token window with Sonnet 4 and includes default data protection policies: by default, chat content from these plans is not used for training.

In collaborative workflows, Anthropic recommends structuring prompts with sections—context, task, constraints, expected output—and maintaining a local summary to help restart threads when memory fades.Token budgeting in team workflows should still respect the core limit: input + output + retrieved content ≤ context window.

Practical strategies for working within Claude’s token window.

Summarize progressively: if you’re working on a long problem, carry forward compact summaries between steps instead of full text.
Anchor with labels: reference earlier information using unique tags (e.g., [CLIENT:XY3]) to help Claude recall the right section without reloading it.
Split and stage: divide large documents or plans across multiple prompts and answers, tracking the structure manually or with a shared outline.
Watch output size: if your input is large, set expectations for shorter responses. Output counts too.

The token window doesn’t grow—but the system around it gets smarter.

The core limit—200k or 500k tokens, depending on plan—remains a hard cap. But with Projects, extended output, structured files, and prompt engineering, Claude users can simulate longer memory and manage complex inputs effectively.Success depends less on squeezing in more content and more on controlling what gets loaded—chunk by chunk, turn by turn.

____________

DATA STUDIOS

datastudios.org