top of page

Google Gemini file upload: types, formats, and capabilities

ree

Google Gemini now supports a wide range of file formats and workflows, designed to improve document analysis, multimedia processing, and integrated workspace usage. The system offers different capabilities depending on where you interact with Gemini: the web app, mobile app, Google Workspace integration, or the developer-focused API. Recent updates have expanded supported formats, increased file size limits, and introduced new tools for code repositories and spreadsheet-heavy datasets.



Gemini supports a wide variety of document formats.

Gemini provides broad support for text-based file types, allowing users to upload and analyze both simple and structured documents. In the web and mobile app, users can work with formats such as DOC, DOCX, PDF, RTF, DOT, DOTX, HWP, HWPX, TXT, and Google Docs. Presentations in PPTX and Google Slides are fully supported as well, enabling slide-by-slide summarization and Q&A capabilities.


For spreadsheet workflows, Gemini handles XLS, XLSX, CSV, and TSV, along with Google Sheets natively. These formats allow detailed analysis of structured data, calculations, and tabular reporting, with AI-generated summaries and insights available directly in the interface. When these files are stored in Google Drive, Gemini integrates seamlessly without requiring separate uploads.



Image, video, and audio processing are fully integrated.

Gemini is now optimized for handling multimedia formats across different workflows. For images, supported formats include JPEG, JPG, PNG, WEBP, and HEIF, making it compatible with modern high-resolution photo standards. Uploads allow visual inspection, content extraction, and multimodal Q&A, giving Gemini the ability to parse diagrams, graphs, and screenshots in a single prompt.


For video, Gemini accepts MP4, MOV, MPEG, MPG, AVI, WMV, FLV, WEBM, and 3GPP formats. The standard plan allows analysis of up to 5 minutes per video, while users subscribed to Gemini Advanced (AI Pro or Ultra) can process content up to 1 hour. This upgrade benefits use cases like long-form lectures, training sessions, and product demonstrations.


Audio handling now includes WAV, MP3, AIFF, AAC, OGG Vorbis, and FLAC. These formats enable transcription, summarization, and insights extraction from spoken content. In the developer API, supported audio inputs extend to podcasts and long-form recordings, with a total maximum duration of 9 hours and 30 minutes per request.


Gemini enables code repository uploads and structured ZIP analysis.

Gemini’s advanced tiers provide dedicated functionality for developers and technical users working with large collections of files. Entire code repositories or project folders can be uploaded, supporting up to 5,000 files per chat and a maximum 100 MB per repository. These capabilities make it possible to analyze cross-file dependencies, troubleshoot logic issues, or summarize documentation across multi-layered codebases.


For compressed archives, ZIP uploads are supported, allowing up to 10 individual files per archive. When combined with Gemini’s structured output capabilities, this makes it easier to process grouped datasets or bundled document sets without manually uploading each file.



Integration with Google Workspace improves workflows.

Gemini’s connection with Google Workspace provides a seamless experience when working inside Docs, Sheets, Slides, and Drive. Within Google Drive, Gemini can process PDFs and videos directly from storage, enabling contextual summaries and instant answers without requiring manual downloads or uploads.


For Sheets and Docs, the integrated “Ask Gemini” panel enables AI-powered insights inside the document itself. Users can run queries, extract statistics, summarize datasets, and generate contextual responses directly within their Workspace environment. This integration significantly improves productivity when managing complex spreadsheets or large text-heavy documents.


The developer Files API expands Gemini’s file handling capabilities.

For technical users and enterprise applications, Gemini’s Files API provides additional flexibility with increased size limits and automation capabilities. Each uploaded file can be up to 2 GB, and projects can store up to 20 GB of data with a retention period of 48 hours. The API supports all major document, image, audio, and video formats, enabling developers to build custom solutions around Gemini’s file-processing engine.


This interface is especially useful for high-volume workflows like transcript generation, dataset parsing, large-scale code reviews, or integrating Gemini with enterprise reporting pipelines. In multimodal use cases, developers can combine uploaded PDFs, images, and video references within a single request, leveraging Gemini’s extended context window to manage complex inputs.


Key upload limits and supported formats.

Category

Supported formats

Max size & duration

Availability

Documents

DOC, DOCX, PDF, RTF, TXT, HWP, DOT, Google Docs

100 MB

All plans

Spreadsheets

XLS, XLSX, CSV, TSV, Google Sheets

100 MB

Advanced tier for large datasets

Presentations

PPTX, Google Slides

100 MB

All plans

Images

JPG, JPEG, PNG, WEBP, HEIF

100 MB

All plans

Video

MP4, MOV, MPEG, AVI, WMV, WEBM, 3GPP, FLV

2 GB (5 min free, 1h Pro/Ultra)

All plans

Audio

MP3, WAV, AIFF, AAC, OGG, FLAC

100 MB / 9h30 via API

All plans

Code repositories

GitHub or local folders

5,000 files / 100 MB

Pro/Ultra

ZIP archives

ZIP (≤10 files)

100 MB

All plans

Files API

Documents, media, datasets

2 GB / 20 GB per project

Developers


Gemini’s file-handling capabilities are evolving quickly, driven by the growing demand for multimodal AI interactions. With its expanding compatibility across documents, spreadsheets, images, videos, and entire codebases, Gemini has become one of the most versatile assistants for professionals, researchers, and developers managing diverse datasets.



____________

FOLLOW US FOR MORE.


DATA STUDIOS


bottom of page