Claude: Token limits and context windows

Graziano Stefanelli
Aug 22, 2025
3 min read

Claude introduces expanded context capabilities.

Anthropic has recently rolled out significant updates to Claude’s context windows, particularly focusing on Claude Sonnet 4, Claude Opus 4.1, and the Heavy 4 private preview. These enhancements allow developers and enterprises to process substantially larger datasets, manage lengthy document uploads, and handle multimodal inputs more efficiently.

The key update is the introduction of a 1,000,000-token beta context window for Sonnet 4, available through Anthropic’s API and Amazon Bedrock, with Vertex AI integration scheduled next. This enables more complex operations, such as analyzing hundreds of pages of PDFs, large knowledge bases, and mixed media inputs, while maintaining low latency and reliable stability.

Claude Sonnet 4 now supports a 1,000,000-token beta context window.

Anthropic’s flagship release includes a 1M-token beta program for Sonnet 4, accessible using the header flag betas=["context-1m-2025-08-07"]. In practical terms, this enables developers to load extremely large datasets and manage longer conversations without needing manual segmentation.

Model	Standard Context	Beta Context	Latency (First Token)	Availability
Sonnet 4	200,000 tokens	1,000,000 tokens	≈ 1.2 sec	Public beta, API + Bedrock
Opus 4.1	200,000 tokens	N/A	≈ 1.9 sec	Production
Heavy 4 Preview	256,000 tokens	N/A	≈ 2.1 sec	Private preview only

The 1M-token beta has been released to address growing demand from enterprise users who work with large-scale legal documents, financial datasets, R&D archives, and multimedia-driven workflows. For now, Anthropic reserves approximately 6% of this window for system and safety tokens, leaving ≈940,000 tokens available to users.

Opus 4.1 and Heavy 4 maintain high-capacity performance.

While Sonnet 4 leads the 1M beta rollout, Opus 4.1 remains optimized for intensive reasoning tasks, sustaining a 200,000-token context window with faster average token generation speeds than previous models. The Heavy 4 private preview offers 256,000 tokens for select enterprise partners and introduces enhanced multimodal features, including depth-map vision for image understanding and early support for structured video frames.

For developers working at scale, this balance between latency, reasoning depth, and multimodal capabilities determines the best model choice depending on workload complexity.

Pricing structures adapt to long-context workloads.

Anthropic has introduced new pricing tiers for token-intensive use cases. Costs differ depending on model selection and whether users leverage the 1M-token beta or standard context windows.

Tier	Input Cost (per 1,000 tokens)	Output Cost (per 1,000 tokens)	Cache Read Discount
Sonnet 4 (standard)	$3.00	$15.00	N/A
Sonnet 4 (1M beta)	$6.00	$22.50	-75% on cached tokens
Opus 4.1	$15.00	$75.00	N/A
Heavy 4 Preview	TBD	TBD	TBD

The context-cache beta now enables significant cost optimizations, offering ≈75% discounts on cached token reads, while cached writes add a 25% storage overhead. This is particularly relevant for scenarios involving frequent reference to stable datasets, enabling businesses to reduce costs without sacrificing context length.

Technical best practices for managing large context windows.

As Claude’s token limits increase, efficient management becomes critical for maintaining performance and cost control. Anthropic has issued several recommendations:

Scenario	Recommendation	Token Impact
Large PDF uploads	Chunk files into ≤50,000-token segments	Avoids overflow
Code repositories	Compress and exclude build artefacts	Saves ~15-20% tokens
Image processing	Batch ≤10 images per request	Prevents request failure
Audio inputs	Stream ≤2-minute segments	Minimizes processing delay
Video frames	Limit to 1 frame ≈120 tokens	Controls multimodal usage

By adopting these optimizations, developers can avoid costly overages while ensuring the model processes high-volume data efficiently.

Roadmap includes future 2,000,000-token expansion.

Anthropic has confirmed plans to extend Claude’s capabilities to 2,000,000 tokens by the end of the year for enterprise partners, particularly those using Vertex AI and Bedrock. This roadmap reflects a broader trend towards ultra-long context AI systems, aimed at enabling:

End-to-end reasoning across large-scale corporate datasets
Improved multimodal indexing across documents, images, and video
More advanced RAG (retrieval-augmented generation) pipelines
Efficient consolidation of legal, medical, and financial archives

This evolution positions Claude among the leading tools for high-capacity document intelligence, bridging the gap between token efficiency and deep contextual reasoning.

Do you want me to proceed with row 54 next — “Paid vs free plan feature differences” for Perplexity AI — and make a full, detailed research before drafting the article?

____________

DATA STUDIOS

datastudios.org