Claude for PDFs: How Anthropic’s AI Reads, Understands, and Analyzes Complex Documents in 2025

Jun 9, 2025
3 min read

Claude can read and analyze PDF files directly. It processes both the visual layout and the underlying text of each page.

You can upload contracts, reports, scanned documents, or research papers.

Claude will extract information, interpret charts, and answer questions about the content.

This capability is available through both the chat interface and the API.

1 · Evolution of the Feature

Native PDF handling debuted in late-2024 with text-only extraction. Full multimodal processing—rasterised pages plus raw text—arrived in February 2025, followed in May by URL source blocks that let clients stream documents directly from the web. The capability shipped to general availability in the Claude 4 model family two months later.

2 · Server-Side Pre-Flight

When a document arrives, the gateway validates MIME type, computes a SHA-256 hash for cache look-ups, then applies two hard envelopes: 32 MB size or 100 pages per request. Encrypted files, embedded JavaScript, or anything breaching those ceilings is rejected before any tokens are consumed.

3 · Dual Extraction Pass

Each accepted page follows two parallel routes:

Route	Output	Purpose
Rasteriser	300 dpi PNG	Preserves spatial cues—chart axes, cell borders, page geometry
Text extractor	Unicode text stream	Captures searchable content, hidden copy, accessibility layers

Page images and text tokens are interleaved in natural order so prompts such as “Compare the pie charts on pages 7 and 23” resolve deterministically.

4 · Tokenisation Mechanics

Image bundle: ≈ 1 600 tokens per page
Text stream: 1 500 – 3 000 tokens per page

A token-counting endpoint lets developers pre-flight exact costs, enabling batching strategies for large uploads.

5 · Three Ingestion Pathways

URL block – zero base-64 overhead; ideal for public filings.
Base-64 block – suits ephemeral local files.
file_id reuse – upload once via the Files API, reference the same hash across calls; cache hits skip re-tokenising.

6 · Client-Side Guardrails

Claude.ai UI: 30 MB per file, 20 files per conversation.
Vision cut-over: pages beyond 100 are delivered as text only—charts become opaque but prose remains accessible.

These limits target latency and token containment; they are not model constraints.

7 · Capability Matrix

Model	Vision PDFs ≤ 100 pp	Text-only fallback	Context window
Opus 4	✔	—	128 k
Sonnet 4	✔	—	128 k
Sonnet 3.7	✔	—	128 k
Sonnet 3.5	✔	—	200 k (beta)
Haiku 3.5	✔	—	32 k
Legacy 3.x	—	✔	9 k – 32 k

Bedrock currently exposes text-only mode while vision rollout continues region by region.

8 · Cost Modelling

At 2025 rates (Sonnet 4: $3 per million image tokens; $0.30 per million text tokens) a 50-page annual report analysed with full vision incurs roughly $0.38. Eliminating images halves cost but removes chart comprehension, a poor trade-off for business-intelligence audits yet acceptable for redline comparisons.

9 · Security & Compliance

The render pass strips scripts, embedded fonts, and LZW streams. Files rest in encrypted object storage and inherit the organisation’s retention policy (default 30 days, configurable down to one hour). Enterprise tenants can supply customer-managed keys to satisfy SOC 2 and GDPR transfer clauses.

10 · Benchmark Evidence

A June 2025 head-to-head across literature, legal, and scientific PDFs graded 115 queries and found Claude the sole model with zero hallucinated citations—directly attributable to its unified vision-plus-text feed.

11 · Architectural Trade-Offs

100-page ceiling: keeps mixed-mode requests below ~400 k tokens once prompts and chain-of-thought overhead are included, fitting within 128 k sliding windows after page eviction.
PNG over JPEG: lossless renders preserve gridlines and one-pixel tick marks critical for table recovery.
Per-page bundling: enables sliding-window eviction so giant appendices can drop early pages without discarding entire context.

12 · Engineering Playbook

Split long scans at logical divisions to keep each chunk within the vision limit.
Ask Claude to emit CSV for wide numeric tables, then feed downstream analytics.
Cache uploads by document hash to avoid repeated billing across sessions.
Pre-pend a web search block when external data (e.g., stock tickers) must be current.

13 · Forward Roadmap

In-progress work includes per-page streaming responses, differential pricing for text-only requests, and full vision enablement on additional Bedrock regions. Release cadence suggests the next expansion window in August 2025.

_______

DATA STUDIOS

datastudios.org