Google Gemini: File Upload and Reading: formats, limits, structured outputs, and Workspace grounding
- Graziano Stefanelli
- 4 hours ago
- 5 min read

Google Gemini can read, summarize, and analyze files directly inside the Gemini app and across Google Workspace. You can attach PDFs, office docs, spreadsheets, images, and even audio/video, then ask targeted questions—page-scoped summaries, table extraction to CSV, chart explanations, code review, and more. When used with Drive, Gemini also grounds answers in the files you already store and share, keeping permissions intact.
·····
.....
Where file upload works (and how it behaves).
Gemini processes files in three primary surfaces, each optimized for a different stage of your workflow:
Surface | Typical use | What you can upload | Why/when to use it |
Gemini app (web/mobile/desktop) | Ad-hoc analysis, triage, personal research | PDFs, Docs, Sheets (via link), images, CSV/XLSX, short audio/video | Fastest way to “ask a doc a question” or compare multiple files in one prompt |
Google Workspace (Docs/Sheets/Slides/Gmail) | Work in context with Drive permissions | Files already in Drive; attach or reference by name | Answers respect sharing and org policies; easy handoff to collaborators |
Developers (AI Studio / Vertex AI) | Repeatable pipelines, governance, automation | Same formats via API with project-level limits | Use structured output, function calling, and retrieval for production apps |
Rule of thumb: prototype your question in the Gemini app; if it becomes a repeatable task, promote it into a Workspace macro (Apps Script), AI Studio prompt, or a Vertex pipeline.
·····
.....
Supported formats and what Gemini “reads.”
File type | What Gemini does | Practical notes |
Parses text layers; identifies sections, headers, footers; can describe and reason about embedded figures and charts | Best accuracy with digital (non-scanned) PDFs; for scans, consider uploading page images alongside the PDF | |
Google Docs / DOCX / TXT / RTF | Summarizes, rewrites, outlines, style-transforms | Refer to files by Drive name for grounding and versioning |
Google Sheets / CSV / XLSX | Interprets rows/columns, computes metrics, explains formulas, outputs CSV/Markdown tables | For large sheets, specify ranges (e.g., A:H, rows 2–5000) |
Images (PNG/JPEG/WEBP) | Reads charts, UI screenshots, diagrams, handwriting; can cross-reference with text | Add a caption hint (“This is Q4 revenue by region”) to guide attention |
Slides (PPTX/Google Slides) | Slide-by-slide summaries, speaker-note extraction, deck comparisons | Ask for slide ranges and “talking points” tables |
Audio/Video (short) | Transcribes and summarizes; extracts action items | Long media works best when paired with timestamps or a target section |
Gemini’s multimodal stack means you can combine files—e.g., a PDF + a CSV + a screenshot of a chart—and ask it to reconcile numbers across them.
·····
.....
Upload caps and practical limits.
Exact quotas vary by account and surface, but the following planning guidelines are reliable for day-to-day work:
Number of files per prompt: up to 10 items is typical.
Per-file size: most document/image types are comfortable up to ~100 MB; videos are commonly supported up to ~2 GB for upload with length limits.
Context size: Gemini’s long-context models accept very large prompts, but you’ll get better speed and accuracy by referencing specific pages, ranges, or sections rather than dumping entire corpora.
If a file is rejected for size or length, split by section (chapters for PDFs, ranges for sheets, selected slides for decks) and run a staged workflow (see below).
·····
.....
Best-practice prompting for files (works across PDFs, sheets, and slides).
Scope the request.“From pages 9–14 of the PDF, extract key dates and parties; ignore appendices.”
Pin the output format.“Return CSV with headers date_iso, counterparty, amount_usd, page—no extra commentary.”
Add cross-file grounding.“Compare totals in sheet range Revenue!B2:E13 with the ‘Financial Summary’ table on page 5 of the PDF.”
Iterate in passes.Pass 1: extract structured data; Pass 2: analyze trends; Pass 3: draft the exec summary.
Label images.“Image 1 = Q4 bar chart; Image 2 = product segmentation pie chart.”
These habits reduce token waste and produce stable, reusable outputs.
·····
.....
Structured outputs: make answers machine-readable.
Gemini often complies with “return JSON only,” but you’ll get maximum reliability by defining schemas (field names, types, allowed enums) and asking for validated JSON or CSV. This is essential for:
KPI extraction from PDFs (tables → CSV).
Entity extraction (contracts → JSON {clause, page, risk});
Sheet transformations (ranges → normalized CSV with explicit types).
When you plan to reuse results in Sheets/BigQuery, ask for normalized number formats (e.g., no thousands separators, ISO dates).
·····
.....
Workspace grounding: reference Drive files by name or link.
Inside Workspace, Gemini respects the Drive permissions of the signed-in user. Instead of pasting entire documents, you can reference them:
“Summarize ClientReport_March.pdf and reconcile with Sales_Q1.xlsx.”
“Draft a reply in Gmail that cites the milestones in ProjectPlan_v4.”
“Create a slide with three bullets based on Meeting Notes (May 12).”
This delivers accurate answers and a clean audit trail while avoiding redundant uploads.
·····
.....
Ready-to-use prompts (copy/paste).
PDF → JSON (contracts)“From pages 12–20, extract confidentiality and termination clauses. Return JSON {clause_title, text, page, risk_level_enum}; use low|medium|high for risk_level_enum.”
Table extraction (reports)“Convert the table labeled ‘Quarterly Revenue by Region’ on page 5 into CSV. Normalize currency to USD; use ISO dates.”
Charts → insights“Explain the bar chart in Image 1 in 3 bullets: trend, outliers, implication. Then list the exact labels and values you relied on.”
Sheets reconciliation“In Revenue!B2:E13, compute totals by quarter and compare to the numbers on page 6 of the PDF. If mismatched, produce a two-row table: {source, amount, delta}.”
Slide deck summary“Summarize Slides 2–10 into 5 bullets plus a one-sentence conclusion suitable for the executive memo.”
·····
.....
Troubleshooting quick fixes.
Symptom | Likely cause | Fast fix |
Messy PDF tables | Scanned/complex layout | Upload the relevant pages as images; ask for CSV from the image |
“File too large/long” | Per-file/length cap | Split by chapter (PDF) or by range (sheet); run a staged workflow |
Partial/inconsistent JSON | Free-form prompt | Specify an explicit schema and request “JSON only” |
Wrong sheet referenced | Ambiguous names | Include the sheet tab name and range explicitly |
Slow responses | Oversized prompt | Use page/range scoping and multi-pass extraction |
·····
.....
Example workflows (step-by-step).
A) Compliance scan on a policy PDF
Upload the PDF.
Ask for a section map (titles + page numbers).
Extract clauses into JSON by section.
Request a risk register table with {clause, risk, mitigation, owner}.
Paste JSON into Sheets or export CSV.
B) Sales operations from mixed sources (sheet + PDF + chart)
Reference Sales_Q1.xlsx and upload the Q4 chart image.
“Compute YoY% by region (sheet range), then validate against the chart values.”
“Return a Markdown table and a 100-word narrative for the weekly update.”
C) Research paper triage (PDF + notes)
“Summarize Abstract + Conclusions as bullets.”
“Extract methods parameters to JSON {param, value, unit}.”
“Draft three follow-up questions to verify findings.”
·····
.....
Comparison snapshot (file reading across assistants).
Capability | Gemini | ChatGPT | Claude | Copilot |
Google Drive grounding | Native | Via connected apps or upload | Upload/attach files | Best inside Microsoft 365 |
Multimodal (text + images) | Strong | Strong | Moderate (vision varies) | Strong in Edge/Office surfaces |
Long context window | Very large | Large (tier-dependent) | Very large | Depends on app surface |
Best fit | Workspace-centric teams, schema’d outputs | Broad creative and file Q&A | Explanatory/narrative depth | Office-native document workflows |
Gemini’s differentiator is Workspace grounding: using Drive file names and permissions to anchor answers without moving data around.
·····
.....
The bottom line.
Treat Gemini as a document intelligence layer across your Drive. Upload or reference files, scope your questions to pages/ranges, and lock outputs into CSV/JSON when the result will feed Sheets, Slides, or a downstream tool. For repeat tasks, promote the prompt into a Workspace macro, AI Studio prompt, or Vertex pipeline and keep your analysis fast, governed, and reproducible.
.....
FOLLOW US FOR MORE.
DATA STUDIOS
.....




