Google Gemini: File Upload and Reading: formats, limits, structured outputs, and Workspace grounding

Oct 27, 2025
5 min read

Google Gemini can read, summarize, and analyze files directly inside the Gemini app and across Google Workspace. You can attach PDFs, office docs, spreadsheets, images, and even audio/video, then ask targeted questions—page-scoped summaries, table extraction to CSV, chart explanations, code review, and more. When used with Drive, Gemini also grounds answers in the files you already store and share, keeping permissions intact.

·····

.....

Where file upload works (and how it behaves).

Gemini processes files in three primary surfaces, each optimized for a different stage of your workflow:

Surface	Typical use	What you can upload	Why/when to use it
Gemini app (web/mobile/desktop)	Ad-hoc analysis, triage, personal research	PDFs, Docs, Sheets (via link), images, CSV/XLSX, short audio/video	Fastest way to “ask a doc a question” or compare multiple files in one prompt
Google Workspace (Docs/Sheets/Slides/Gmail)	Work in context with Drive permissions	Files already in Drive; attach or reference by name	Answers respect sharing and org policies; easy handoff to collaborators
Developers (AI Studio / Vertex AI)	Repeatable pipelines, governance, automation	Same formats via API with project-level limits	Use structured output, function calling, and retrieval for production apps

Rule of thumb: prototype your question in the Gemini app; if it becomes a repeatable task, promote it into a Workspace macro (Apps Script), AI Studio prompt, or a Vertex pipeline.

·····

.....

Supported formats and what Gemini “reads.”

File type	What Gemini does	Practical notes
PDF	Parses text layers; identifies sections, headers, footers; can describe and reason about embedded figures and charts	Best accuracy with digital (non-scanned) PDFs; for scans, consider uploading page images alongside the PDF
Google Docs / DOCX / TXT / RTF	Summarizes, rewrites, outlines, style-transforms	Refer to files by Drive name for grounding and versioning
Google Sheets / CSV / XLSX	Interprets rows/columns, computes metrics, explains formulas, outputs CSV/Markdown tables	For large sheets, specify ranges (e.g., A:H, rows 2–5000)
Images (PNG/JPEG/WEBP)	Reads charts, UI screenshots, diagrams, handwriting; can cross-reference with text	Add a caption hint (“This is Q4 revenue by region”) to guide attention
Slides (PPTX/Google Slides)	Slide-by-slide summaries, speaker-note extraction, deck comparisons	Ask for slide ranges and “talking points” tables
Audio/Video (short)	Transcribes and summarizes; extracts action items	Long media works best when paired with timestamps or a target section

Gemini’s multimodal stack means you can combine files—e.g., a PDF + a CSV + a screenshot of a chart—and ask it to reconcile numbers across them.

·····

.....

Upload caps and practical limits.

Exact quotas vary by account and surface, but the following planning guidelines are reliable for day-to-day work:

Number of files per prompt: up to 10 items is typical.
Per-file size: most document/image types are comfortable up to ~100 MB; videos are commonly supported up to ~2 GB for upload with length limits.
Context size: Gemini’s long-context models accept very large prompts, but you’ll get better speed and accuracy by referencing specific pages, ranges, or sections rather than dumping entire corpora.

If a file is rejected for size or length, split by section (chapters for PDFs, ranges for sheets, selected slides for decks) and run a staged workflow (see below).

·····

.....

Best-practice prompting for files (works across PDFs, sheets, and slides).

Scope the request.“From pages 9–14 of the PDF, extract key dates and parties; ignore appendices.”
Pin the output format.“Return CSV with headers date_iso, counterparty, amount_usd, page—no extra commentary.”
Add cross-file grounding.“Compare totals in sheet range Revenue!B2:E13 with the ‘Financial Summary’ table on page 5 of the PDF.”
Iterate in passes.Pass 1: extract structured data; Pass 2: analyze trends; Pass 3: draft the exec summary.
Label images.“Image 1 = Q4 bar chart; Image 2 = product segmentation pie chart.”

These habits reduce token waste and produce stable, reusable outputs.

·····

.....

Structured outputs: make answers machine-readable.

Gemini often complies with “return JSON only,” but you’ll get maximum reliability by defining schemas (field names, types, allowed enums) and asking for validated JSON or CSV. This is essential for:

KPI extraction from PDFs (tables → CSV).
Entity extraction (contracts → JSON {clause, page, risk});
Sheet transformations (ranges → normalized CSV with explicit types).

When you plan to reuse results in Sheets/BigQuery, ask for normalized number formats (e.g., no thousands separators, ISO dates).

·····

.....

Workspace grounding: reference Drive files by name or link.

Inside Workspace, Gemini respects the Drive permissions of the signed-in user. Instead of pasting entire documents, you can reference them:

“Summarize ClientReport_March.pdf and reconcile with Sales_Q1.xlsx.”
“Draft a reply in Gmail that cites the milestones in ProjectPlan_v4.”
“Create a slide with three bullets based on Meeting Notes (May 12).”

This delivers accurate answers and a clean audit trail while avoiding redundant uploads.

·····

.....

Ready-to-use prompts (copy/paste).

PDF → JSON (contracts)“From pages 12–20, extract confidentiality and termination clauses. Return JSON {clause_title, text, page, risk_level_enum}; use low|medium|high for risk_level_enum.”

Table extraction (reports)“Convert the table labeled ‘Quarterly Revenue by Region’ on page 5 into CSV. Normalize currency to USD; use ISO dates.”

Charts → insights“Explain the bar chart in Image 1 in 3 bullets: trend, outliers, implication. Then list the exact labels and values you relied on.”

Sheets reconciliation“In Revenue!B2:E13, compute totals by quarter and compare to the numbers on page 6 of the PDF. If mismatched, produce a two-row table: {source, amount, delta}.”

Slide deck summary“Summarize Slides 2–10 into 5 bullets plus a one-sentence conclusion suitable for the executive memo.”

·····

.....

Troubleshooting quick fixes.

Symptom	Likely cause	Fast fix
Messy PDF tables	Scanned/complex layout	Upload the relevant pages as images; ask for CSV from the image
“File too large/long”	Per-file/length cap	Split by chapter (PDF) or by range (sheet); run a staged workflow
Partial/inconsistent JSON	Free-form prompt	Specify an explicit schema and request “JSON only”
Wrong sheet referenced	Ambiguous names	Include the sheet tab name and range explicitly
Slow responses	Oversized prompt	Use page/range scoping and multi-pass extraction

·····

.....

Example workflows (step-by-step).

A) Compliance scan on a policy PDF

Upload the PDF.
Ask for a section map (titles + page numbers).
Extract clauses into JSON by section.
Request a risk register table with {clause, risk, mitigation, owner}.
Paste JSON into Sheets or export CSV.

B) Sales operations from mixed sources (sheet + PDF + chart)

Reference Sales_Q1.xlsx and upload the Q4 chart image.
“Compute YoY% by region (sheet range), then validate against the chart values.”
“Return a Markdown table and a 100-word narrative for the weekly update.”

C) Research paper triage (PDF + notes)

“Summarize Abstract + Conclusions as bullets.”
“Extract methods parameters to JSON {param, value, unit}.”
“Draft three follow-up questions to verify findings.”

·····

.....

Comparison snapshot (file reading across assistants).

Capability	Gemini	ChatGPT	Claude	Copilot
Google Drive grounding	Native	Via connected apps or upload	Upload/attach files	Best inside Microsoft 365
Multimodal (text + images)	Strong	Strong	Moderate (vision varies)	Strong in Edge/Office surfaces
Long context window	Very large	Large (tier-dependent)	Very large	Depends on app surface
Best fit	Workspace-centric teams, schema’d outputs	Broad creative and file Q&A	Explanatory/narrative depth	Office-native document workflows

Gemini’s differentiator is Workspace grounding: using Drive file names and permissions to anchor answers without moving data around.

·····

.....

The bottom line.

Treat Gemini as a document intelligence layer across your Drive. Upload or reference files, scope your questions to pages/ranges, and lock outputs into CSV/JSON when the result will feed Sheets, Slides, or a downstream tool. For repeat tasks, promote the prompt into a Workspace macro, AI Studio prompt, or Vertex pipeline and keep your analysis fast, governed, and reproducible.

.....

DATA STUDIOS

.....

[datastudios.org]