Google AI Studio — File Upload and Reading: formats, limits, structured output, and long-context workflows
- Graziano Stefanelli
- 5 hours ago
- 5 min read

Google AI Studio (the workspace for building with Gemini via the Gemini API) lets you upload files, ground prompts on them, and produce structured outputs—all backed by Gemini’s long-context models. Under the hood, AI Studio rides on the Gemini Files API and the same capabilities exposed in Vertex AI for enterprise deployments. Below is a practical, source-backed guide to what you can upload, how big it can be, how long it persists, and the best patterns for reliable document Q&A and data extraction.
·····
.....
Two surfaces to know: AI Studio vs. Vertex AI (and Gemini Apps).
Google AI Studio (Gemini API): developer console + API where you can upload files via the Files API (2 GB per file, 20 GB per project storage, retained 48 hours). Free to use; useful for prototyping multimodal prompts that read PDFs, images, audio, etc.
Vertex AI (enterprise): production deployment with admin, quotas, and broader doc caps (for example, Gemini 2.5 Pro lists a 500 MB input size limit; images up to 3,000 per prompt and 7 MB each; doc limits vary by modality).
Gemini Apps (consumer/pro): the end-user chat app also supports file uploads (up to 10 files per prompt; most file types up to 100 MB, videos up to 2 GB; audio/video length caps depend on plan). Handy to quickly test the same file behaviors you’ll automate later.
Rule of thumb: prototype in AI Studio with the Files API, graduate to Vertex AI for governed scale, and sanity-check UX constraints in Gemini Apps when you care about end-user experience caps.
·····
.....
What you can upload—and how Gemini reads it.
File type | What Gemini does | Notes |
Parses text layers, headings, tables; can reason over embedded figures when provided as images; supports section/page-scoped Q&A. | AI Studio via Files API (2 GB/file, 48 hr retention). Vertex adds larger pipelines and batch options. | |
Images (PNG/JPEG/WEBP) | Vision understanding of charts/diagrams/screenshots. | Vertex spec: up to 3,000 images/prompt, 7 MB/image. |
Audio | Transcribe/summarize; highlight key points. | Gemini Apps recently enabled audio uploads with plan-based length caps. |
Text/Docs (TXT/MD/DOCX) | Extract, summarize, transform; ideal for policy docs and briefs. | Works well with structured output JSON schemas. |
Data (CSV/TSV) | Column/row parsing; compute metrics; emit tables/CSV/JSON. | Use schemas to lock types and field names. |
Workspace extras: In Google Workspace, Gemini can auto-summarize PDFs stored in Drive (shows a summary card when you open a PDF), which is useful for quick triage before pushing files into AI Studio/Vertex flows.
·····
.....
Hard limits & retention (what to plan around).
Surface | Per-file limit | Project / storage limit | Retention | Notes |
Gemini Files API (AI Studio) | 2 GB | 20 GB per project | 48 hours | Free to use in all Gemini API regions. |
Vertex AI – Gemini 2.5 Pro | 500 MB input size (per call) | Quota-based | By project policy | Image caps: 3,000/prompt, 7 MB each. |
Gemini Apps | 100 MB for most file types; video 2 GB | — | Per account | Up to 10 files/prompt; audio/video length depends on plan. |
Context window: Gemini’s long-context models support ~1M tokens (and more on emerging SKUs), which lets you ground prompts on very large docs—though retrieval/section strategies still beat raw dumping for speed and accuracy.
·····
.....
Make outputs dependable: use Structured Output.
Instead of “please return JSON,” supply a response schema—Gemini will validate responses to it. This works in AI Studio (Gemini API), Firebase AI Logic, and Vertex AI. Benefits: consistent keys, types, and arrays across runs.
Good pattern:
Natural instruction: “Return only JSON conforming to the schema.”
Attach responseSchema with required fields (string, number, array, enums).
Keep examples aligned with the schema to avoid drift.
·····
.....
Bring your own tools: Function Calling for retrieval & actions.
Declare functions (name/description/typed params) and let Gemini call tools when it needs external data—DB lookups, Drive search by ID, or page-chunk retrieval for PDFs. Feed tool results back to the model in the next turn. This keeps prompts short while grounding answers in your source.
Typical PDF pipeline: Files API upload → chunk/index → retrieve top-k → model synthesis (JSON schema). Use function calling for the retrieval step and for post-processing (e.g., write CSV to Cloud Storage).
·····
.....
Recommended workflows (step-by-step).
A) PDF KPI extraction (AI Studio)
Upload the PDF via Files API (2 GB cap). 2) Chunk/index (Cloud Functions or your app). 3) Retrieve top-k sections for each query (tool/function). 4) Ask Gemini to emit KPIs as validated JSON. 5) Save CSV/JSON downstream.
B) Slide & chart explanation (Vertex)
Use Gemini 2.5 Pro with images (charts/screens) respecting per-image caps (7 MB, up to 3,000 per call). Ask for a two-column Markdown: “Insight” and “Evidence (figure/page).”
C) Audio/lecture recap (Apps → Studio)
Upload audio in Gemini Apps to summarize/transcribe, then bring the cleaned text into AI Studio for structured extraction. Plan around file count/length limits (apps: 10 files/prompt; plan-based time caps).
·····
.....
Prompt templates that work.
Goal | Template |
Page-scoped Q&A (PDF) | “From pages 9–14 of the uploaded PDF (id: ___), extract dates, amounts (USD), counterparties. Return JSON {page, date_iso, amount_usd, party}.” |
Chart explanation (images) | “Explain this chart in 3 bullets: trend, outliers, implication. Then list the exact labels/value ranges you used.” |
Table to CSV | “Convert the table on page 5 to CSV with headers. Normalize thousands separators and ISO-date the first column.” |
Policy diff (two PDFs) | “Compare sections titled ‘Data Retention’ across the two docs; return a table {clause, old_text, new_text, risk_note}.” |
Entity extraction (doc/CSV) | “Return only JSON per schema {entity, category_enum, source_ref}. If missing, set category_enum to unknown.” |
Pair these with responseSchema to lock structure.
·····
.....
Troubleshooting (quick fixes).
Symptom | Likely cause | Fix |
“File too large” (Studio) | > 2 GB file or project > 20 GB | Split or compress; rotate inactive files; mind 48-hour retention. |
Truncated image set (Vertex) | Exceeded 7 MB/image or 3,000 images | Compress images; batch calls. |
Messy table extraction | Complex layout/scan | Render page to image; ask for CSV; use retrieval by page. |
Inconsistent JSON | No schema enforcement | Use Structured Output (responseSchema) and ask for “JSON only.” |
Slow/expensive long prompts | Over-stuffed context | Use retrieval via function calling; pass only relevant chunks. |
·····
.....
Why long context matters (and how not to waste it).
Gemini’s ~1M-token window makes whole-report analysis feasible, but you’ll still get better latency, cost, and accuracy by retrieving the right 2–8 chunks per question and constraining outputs. Treat the big window as headroom, not a target.
·····
.....
Quick comparison of ingestion paths.
Path | Best for | Pros | Cons |
Files API (AI Studio) | Prototyping multimodal prompts | 2 GB/file; free; simple | 48-hour retention; 20 GB/project cap |
Vertex AI (Gemini 2.5 Pro) | Production apps, batch jobs | Enterprise quotas; image/doc specs; 1M context | Per-modality limits, billing applies |
Gemini Apps | End-user testing/triage | 10 files/prompt; quick summaries | Smaller file caps; consumer plan gating |
·····
.....
The bottom line.
Use AI Studio’s Files API for fast, free prototyping (2 GB/file, 20 GB/project, 48-hour retention). When you need governance and scale, move to Vertex AI and combine function calling + structured output with Gemini’s long context to deliver reliable PDF/table/slide Q&A. Keep inputs small with retrieval, lock outputs with schemas, and respect per-surface limits to avoid stalls and drift.
.....
FOLLOW US FOR MORE.
DATA STUDIOS
.....[datastudios.org]

