top of page

Google AI Studio — File Upload and Reading: formats, limits, structured output, and long-context workflows

ree

Google AI Studio (the workspace for building with Gemini via the Gemini API) lets you upload files, ground prompts on them, and produce structured outputs—all backed by Gemini’s long-context models. Under the hood, AI Studio rides on the Gemini Files API and the same capabilities exposed in Vertex AI for enterprise deployments. Below is a practical, source-backed guide to what you can upload, how big it can be, how long it persists, and the best patterns for reliable document Q&A and data extraction.

·····

.....

Two surfaces to know: AI Studio vs. Vertex AI (and Gemini Apps).

  • Google AI Studio (Gemini API): developer console + API where you can upload files via the Files API (2 GB per file, 20 GB per project storage, retained 48 hours). Free to use; useful for prototyping multimodal prompts that read PDFs, images, audio, etc.

  • Vertex AI (enterprise): production deployment with admin, quotas, and broader doc caps (for example, Gemini 2.5 Pro lists a 500 MB input size limit; images up to 3,000 per prompt and 7 MB each; doc limits vary by modality).

  • Gemini Apps (consumer/pro): the end-user chat app also supports file uploads (up to 10 files per prompt; most file types up to 100 MB, videos up to 2 GB; audio/video length caps depend on plan). Handy to quickly test the same file behaviors you’ll automate later.

Rule of thumb: prototype in AI Studio with the Files API, graduate to Vertex AI for governed scale, and sanity-check UX constraints in Gemini Apps when you care about end-user experience caps.

·····

.....

What you can upload—and how Gemini reads it.

File type

What Gemini does

Notes

PDF

Parses text layers, headings, tables; can reason over embedded figures when provided as images; supports section/page-scoped Q&A.

AI Studio via Files API (2 GB/file, 48 hr retention). Vertex adds larger pipelines and batch options.

Images (PNG/JPEG/WEBP)

Vision understanding of charts/diagrams/screenshots.

Vertex spec: up to 3,000 images/prompt, 7 MB/image.

Audio

Transcribe/summarize; highlight key points.

Gemini Apps recently enabled audio uploads with plan-based length caps.

Text/Docs (TXT/MD/DOCX)

Extract, summarize, transform; ideal for policy docs and briefs.

Works well with structured output JSON schemas.

Data (CSV/TSV)

Column/row parsing; compute metrics; emit tables/CSV/JSON.

Use schemas to lock types and field names.

Workspace extras: In Google Workspace, Gemini can auto-summarize PDFs stored in Drive (shows a summary card when you open a PDF), which is useful for quick triage before pushing files into AI Studio/Vertex flows.

·····

.....

Hard limits & retention (what to plan around).

Surface

Per-file limit

Project / storage limit

Retention

Notes

Gemini Files API (AI Studio)

2 GB

20 GB per project

48 hours

Free to use in all Gemini API regions.

Vertex AI – Gemini 2.5 Pro

500 MB input size (per call)

Quota-based

By project policy

Image caps: 3,000/prompt, 7 MB each.

Gemini Apps

100 MB for most file types; video 2 GB

Per account

Up to 10 files/prompt; audio/video length depends on plan.

Context window: Gemini’s long-context models support ~1M tokens (and more on emerging SKUs), which lets you ground prompts on very large docs—though retrieval/section strategies still beat raw dumping for speed and accuracy.

·····

.....

Make outputs dependable: use Structured Output.

Instead of “please return JSON,” supply a response schema—Gemini will validate responses to it. This works in AI Studio (Gemini API), Firebase AI Logic, and Vertex AI. Benefits: consistent keys, types, and arrays across runs.

Good pattern:

  • Natural instruction: “Return only JSON conforming to the schema.”

  • Attach responseSchema with required fields (string, number, array, enums).

  • Keep examples aligned with the schema to avoid drift.

·····

.....

Bring your own tools: Function Calling for retrieval & actions.

Declare functions (name/description/typed params) and let Gemini call tools when it needs external data—DB lookups, Drive search by ID, or page-chunk retrieval for PDFs. Feed tool results back to the model in the next turn. This keeps prompts short while grounding answers in your source.

Typical PDF pipeline: Files API upload → chunk/index → retrieve top-k → model synthesis (JSON schema). Use function calling for the retrieval step and for post-processing (e.g., write CSV to Cloud Storage).

·····

.....

Recommended workflows (step-by-step).

A) PDF KPI extraction (AI Studio)

  1. Upload the PDF via Files API (2 GB cap). 2) Chunk/index (Cloud Functions or your app). 3) Retrieve top-k sections for each query (tool/function). 4) Ask Gemini to emit KPIs as validated JSON. 5) Save CSV/JSON downstream.

B) Slide & chart explanation (Vertex)

Use Gemini 2.5 Pro with images (charts/screens) respecting per-image caps (7 MB, up to 3,000 per call). Ask for a two-column Markdown: “Insight” and “Evidence (figure/page).”

C) Audio/lecture recap (Apps → Studio)

Upload audio in Gemini Apps to summarize/transcribe, then bring the cleaned text into AI Studio for structured extraction. Plan around file count/length limits (apps: 10 files/prompt; plan-based time caps).

·····

.....

Prompt templates that work.

Goal

Template

Page-scoped Q&A (PDF)

“From pages 9–14 of the uploaded PDF (id: ___), extract dates, amounts (USD), counterparties. Return JSON {page, date_iso, amount_usd, party}.”

Chart explanation (images)

“Explain this chart in 3 bullets: trend, outliers, implication. Then list the exact labels/value ranges you used.”

Table to CSV

“Convert the table on page 5 to CSV with headers. Normalize thousands separators and ISO-date the first column.”

Policy diff (two PDFs)

“Compare sections titled ‘Data Retention’ across the two docs; return a table {clause, old_text, new_text, risk_note}.”

Entity extraction (doc/CSV)

“Return only JSON per schema {entity, category_enum, source_ref}. If missing, set category_enum to unknown.”

Pair these with responseSchema to lock structure.

·····

.....

Troubleshooting (quick fixes).

Symptom

Likely cause

Fix

“File too large” (Studio)

> 2 GB file or project > 20 GB

Split or compress; rotate inactive files; mind 48-hour retention.

Truncated image set (Vertex)

Exceeded 7 MB/image or 3,000 images

Compress images; batch calls.

Messy table extraction

Complex layout/scan

Render page to image; ask for CSV; use retrieval by page.

Inconsistent JSON

No schema enforcement

Use Structured Output (responseSchema) and ask for “JSON only.”

Slow/expensive long prompts

Over-stuffed context

Use retrieval via function calling; pass only relevant chunks.

·····

.....

Why long context matters (and how not to waste it).

Gemini’s ~1M-token window makes whole-report analysis feasible, but you’ll still get better latency, cost, and accuracy by retrieving the right 2–8 chunks per question and constraining outputs. Treat the big window as headroom, not a target.

·····

.....

Quick comparison of ingestion paths.

Path

Best for

Pros

Cons

Files API (AI Studio)

Prototyping multimodal prompts

2 GB/file; free; simple

48-hour retention; 20 GB/project cap

Vertex AI (Gemini 2.5 Pro)

Production apps, batch jobs

Enterprise quotas; image/doc specs; 1M context

Per-modality limits, billing applies

Gemini Apps

End-user testing/triage

10 files/prompt; quick summaries

Smaller file caps; consumer plan gating

·····

.....

The bottom line.

Use AI Studio’s Files API for fast, free prototyping (2 GB/file, 20 GB/project, 48-hour retention). When you need governance and scale, move to Vertex AI and combine function calling + structured output with Gemini’s long context to deliver reliable PDF/table/slide Q&A. Keep inputs small with retrieval, lock outputs with schemas, and respect per-surface limits to avoid stalls and drift.

.....

FOLLOW US FOR MORE.

DATA STUDIOS

bottom of page