Gemini for extracting structured content from complex PDFs

Graziano Stefanelli
Sep 15
3 min read

Gemini integrates with Drive and Vertex AI to extract usable content from uploaded PDFs.

Gemini offers two core pathways for working with PDFs: Google Drive summaries for everyday users, and Vertex AI endpoints for developers who require structured data. When a PDF is opened in Google Drive, Gemini automatically generates a sidebar summary with key bullet points, section headings, and—when applicable—embedded tables. This summary includes a “Copy to Sheets” button, enabling one-click transfer of table data into a spreadsheet.

This feature was rolled out broadly in June 2025 and has been designed to function seamlessly within the Google Workspace interface. File uploads up to 100 MB are supported, although performance is optimal under that threshold. Larger files are accepted but may result in partial or incomplete extractions depending on layout complexity.

Drive summary cards highlight tables, propose follow-up actions, and simplify export.

When Gemini detects tables in a PDF, the Drive summary interface proposes structured follow-up actions. Examples include:

“Copy table to Sheets”, which preserves tabular formatting.
“Summarize this PDF” or “Draft a response based on content”.
“List key takeaways”, especially in business, legal, or HR documents.

This functionality is contextual: if Gemini detects formal structure (tables, headings, numbered sections), it surfaces export tools. If the document is unstructured or scanned as an image, Gemini attempts OCR but with limited formatting accuracy.

Workspace admins can disable or restrict these features in the Admin Console under “Drive → Features → AI summaries,” applying domain-level DLP if needed.

Vertex AI allows developers to extract structured JSON from PDF files.

For technical workflows, Gemini can extract structured information—tables, section headings, key-value lists—via the generativeContent endpoint in Vertex AI. This is accessible from:

Google Cloud Console
REST API
Python, Node.js, and C# SDKs

To use this method, a developer uploads a PDF to a Google Cloud Storage (GCS) bucket and calls the Gemini model with:

mimeType: application/pdf
structuredOutputConfig enabled
A custom prompt instructing the model to extract specific elements (e.g., tables, headings, metadata)

Example prompt schema:

{
  "task": "Extract tables as JSON with columns and row numbers",
  "output_format": "array of objects",
  "range": "pages 3–12 only"
}

The output is a fully structured JSON array or dictionary, which can then be passed to downstream apps or analytics pipelines.

Prompt engineering and layout awareness are key to accurate table extraction.

Gemini performs well on moderately complex documents, but results may vary when handling:

Merged cells or multilevel headers: table headers may drop or misalign. Use prompts like “Flatten merged cells; repeat headers in each row” to improve output.
Scanned image-based PDFs: OCR is available but struggles with skewed scans. Preprocessing with Google Cloud Vision's detectDocumentText improves results.
Dense diagrams or vector drawings: non-textual data may inflate token size or trigger truncation. For better performance, instruct Gemini to analyze only specific page ranges.

In real-world testing, Gemini consistently handled documents up to 300 pages, although Google does not officially guarantee performance at that size. The best results are consistently achieved on files under 100 MB with standardized table layouts.

Summary cards in Drive and JSON extraction in Vertex AI meet different user needs.

Feature	Google Drive + Gemini	Vertex AI + Gemini API
Target user	General Workspace user	Developer / analyst
Input method	Drag-and-drop in Drive	Upload to GCS bucket
Output	Bullet summary, action cards	Structured JSON (tables, headings)
File size	Up to 100 MB recommended	Tested up to ~300 pages
OCR support	Yes (built-in)	Yes, with Vision API optional
Admin control	Can disable in Admin Console	Follows GCP IAM and retention policies

End users gain quick insights and export tools through Drive, while enterprise developers can perform complex PDF parsing at scale using Gemini’s generative capabilities. This dual-layer model makes Gemini suitable for tasks ranging from daily reading to automated document intelligence across an organization.

Data stays within Google’s infrastructure and respects workspace governance.

Gemini’s PDF features comply with Google Workspace and Google Cloud’s security policies:

In Drive: Data never leaves the user’s domain; summaries follow Drive’s file permissions.
In Vertex AI: Uploaded files reside in the customer’s encrypted GCS bucket; output retention and audit logs follow GCP IAM and DLP settings.
Training exclusion: Gemini does not use Drive or Vertex AI user data for training unless explicitly permitted under enterprise settings.

This ensures that extracted data remains confidential and under organizational control, even when Gemini is used for advanced parsing, analysis, or transformation of complex documents.

____________

DATA STUDIOS

datastudios.org