Grok AI File Upload and Reading: Supported Formats, Token Limits, and Workflow Tips

Dec 14, 2025
3 min read

Grok supports PDFs, spreadsheets, text, and code archives up to fifty megabytes apiece.

The Files API and chat drop-zone both accept PDF, CSV, XLSX, TXT, Markdown, and ZIP archives containing source code.

Each file may weigh no more than 50 MB, and spreadsheets should remain below 200 000 rows to avoid time-outs.

When a document is uploaded, Grok auto-invokes a document_search tool, enabling natural-language Q&A, table stats, and formula explanations.

Oversized or unsupported uploads trigger a truncation warning, downgrading the extract to plain text.

··········

Supported Formats and Limits

··········

File Type	Size Limit	Special Notes
PDF (text / scanned)	50 MB	Vision layout parsing slows on scans
CSV	200 000 rows	Column stats, anomaly flags
XLSX	200 000 rows	Formula explanation supported
TXT / Markdown	10 MB	Direct ingestion
ZIP (code)	50 MB	Multi-file code review

··········

A 256 000-token context window means large files crowd Grok’s memory budget.

Grok’s long-context limit tops out at 256 K tokens per session.

Upload a 50 MB PDF—roughly 200 K tokens—and only 56 K tokens remain for prompts, intermediate reasoning, and replies.

Once the ceiling approaches, Grok begins sliding the window, discarding older content in favor of newer input.

Token budgeting is therefore critical when chaining multi-file analyses.

··········

Context Window Impact Examples

··········

Upload Scenario	Tokens Consumed	Tokens Left
25 MB PDF (≈ 100 K)	100 000	156 000
50 MB PDF (≈ 200 K)	200 000	56 000
5 MB CSV (≈ 20 K)	20 000	236 000

··········

Core strengths include table extraction, formula explanation, and document-search Q&A.

Grok identifies table headers, computes averages, medians, and outliers, then outputs clean Markdown or CSV.

Excel uploads receive plain-English breakdowns of nested formulas like VLOOKUP or INDEX-MATCH chains.

PDFs can be queried page-by-page; Grok cites paragraph numbers when delivering answers.

Code archives trigger syntax-aware comments, complexity ratings, and refactor suggestions.

··········

Core File-Reading Features

··········

Capability	Example Result
Table stats	Mean, median, outlier rows
Formula explanation	“Cell D7 = SUM(B:B) ÷ COUNT(B:B)”
PDF Q&A	“Page 12, paragraph 3 defines KPI targets.”
Code review	Cyclomatic complexity and refactor plan

··········

Free preview quotas, startup credits, and forthcoming tiers define how many files you can process.

Individual preview accounts throttle after roughly five hours of active agent use, queuing new uploads until quotas reset.

Google Cloud Activate credits grant startups $300 of GPU time—enough for thousands of medium PDFs before billing.

Hackathon and community vouchers temporarily triple rate limits or extend unlimited access for fourteen to thirty days.

Paid Developer Plus and Team tiers, slated for 2026, will add dedicated caches, higher concurrency, and audit logging.

··········

Access Paths and File Quotas

··········

Path	Preview Quota	Future Tier	Notes
Individual preview	50 MB total/day	Developer Plus	Low GPU priority
Startup (Activate)	$300 credits	—	Startup only
Team preview	3× individual	Team	Shared quota

··········

Best practices keep large-file workflows reliable and within token budgets.

Split massive PDFs into 10–20 MB chunks, referencing page ranges to conserve tokens.

Embed IDs or page numbers in prompts so Grok can cite exact locations without reloading entire files.

Cache static glossaries or appendices as separate uploads, reusing them across sessions via file IDs.

Watch token counters; summarize or offload context when usage nears 240 K tokens to avoid silent truncation.

Export Grok’s answers and citations into external knowledge bases before the sliding window discards critical context.

··········

DATA STUDIOS

··········

[datastudios.org]