Google Gemini file upload: types, formats, and capabilities
- Graziano Stefanelli
- Aug 27
- 4 min read

Google Gemini now supports a wide range of file formats and workflows, designed to improve document analysis, multimedia processing, and integrated workspace usage. The system offers different capabilities depending on where you interact with Gemini: the web app, mobile app, Google Workspace integration, or the developer-focused API. Recent updates have expanded supported formats, increased file size limits, and introduced new tools for code repositories and spreadsheet-heavy datasets.
Gemini supports a wide variety of document formats.
Gemini provides broad support for text-based file types, allowing users to upload and analyze both simple and structured documents. In the web and mobile app, users can work with formats such as DOC, DOCX, PDF, RTF, DOT, DOTX, HWP, HWPX, TXT, and Google Docs. Presentations in PPTX and Google Slides are fully supported as well, enabling slide-by-slide summarization and Q&A capabilities.
For spreadsheet workflows, Gemini handles XLS, XLSX, CSV, and TSV, along with Google Sheets natively. These formats allow detailed analysis of structured data, calculations, and tabular reporting, with AI-generated summaries and insights available directly in the interface. When these files are stored in Google Drive, Gemini integrates seamlessly without requiring separate uploads.
Image, video, and audio processing are fully integrated.
Gemini is now optimized for handling multimedia formats across different workflows. For images, supported formats include JPEG, JPG, PNG, WEBP, and HEIF, making it compatible with modern high-resolution photo standards. Uploads allow visual inspection, content extraction, and multimodal Q&A, giving Gemini the ability to parse diagrams, graphs, and screenshots in a single prompt.
For video, Gemini accepts MP4, MOV, MPEG, MPG, AVI, WMV, FLV, WEBM, and 3GPP formats. The standard plan allows analysis of up to 5 minutes per video, while users subscribed to Gemini Advanced (AI Pro or Ultra) can process content up to 1 hour. This upgrade benefits use cases like long-form lectures, training sessions, and product demonstrations.
Audio handling now includes WAV, MP3, AIFF, AAC, OGG Vorbis, and FLAC. These formats enable transcription, summarization, and insights extraction from spoken content. In the developer API, supported audio inputs extend to podcasts and long-form recordings, with a total maximum duration of 9 hours and 30 minutes per request.
Gemini enables code repository uploads and structured ZIP analysis.
Gemini’s advanced tiers provide dedicated functionality for developers and technical users working with large collections of files. Entire code repositories or project folders can be uploaded, supporting up to 5,000 files per chat and a maximum 100 MB per repository. These capabilities make it possible to analyze cross-file dependencies, troubleshoot logic issues, or summarize documentation across multi-layered codebases.
For compressed archives, ZIP uploads are supported, allowing up to 10 individual files per archive. When combined with Gemini’s structured output capabilities, this makes it easier to process grouped datasets or bundled document sets without manually uploading each file.
Integration with Google Workspace improves workflows.
Gemini’s connection with Google Workspace provides a seamless experience when working inside Docs, Sheets, Slides, and Drive. Within Google Drive, Gemini can process PDFs and videos directly from storage, enabling contextual summaries and instant answers without requiring manual downloads or uploads.
For Sheets and Docs, the integrated “Ask Gemini” panel enables AI-powered insights inside the document itself. Users can run queries, extract statistics, summarize datasets, and generate contextual responses directly within their Workspace environment. This integration significantly improves productivity when managing complex spreadsheets or large text-heavy documents.
The developer Files API expands Gemini’s file handling capabilities.
For technical users and enterprise applications, Gemini’s Files API provides additional flexibility with increased size limits and automation capabilities. Each uploaded file can be up to 2 GB, and projects can store up to 20 GB of data with a retention period of 48 hours. The API supports all major document, image, audio, and video formats, enabling developers to build custom solutions around Gemini’s file-processing engine.
This interface is especially useful for high-volume workflows like transcript generation, dataset parsing, large-scale code reviews, or integrating Gemini with enterprise reporting pipelines. In multimodal use cases, developers can combine uploaded PDFs, images, and video references within a single request, leveraging Gemini’s extended context window to manage complex inputs.
Key upload limits and supported formats.
Category | Supported formats | Max size & duration | Availability |
Documents | DOC, DOCX, PDF, RTF, TXT, HWP, DOT, Google Docs | 100 MB | All plans |
Spreadsheets | XLS, XLSX, CSV, TSV, Google Sheets | 100 MB | Advanced tier for large datasets |
Presentations | PPTX, Google Slides | 100 MB | All plans |
Images | JPG, JPEG, PNG, WEBP, HEIF | 100 MB | All plans |
Video | MP4, MOV, MPEG, AVI, WMV, WEBM, 3GPP, FLV | 2 GB (5 min free, 1h Pro/Ultra) | All plans |
Audio | MP3, WAV, AIFF, AAC, OGG, FLAC | 100 MB / 9h30 via API | All plans |
Code repositories | GitHub or local folders | 5,000 files / 100 MB | Pro/Ultra |
ZIP archives | ZIP (≤10 files) | 100 MB | All plans |
Files API | Documents, media, datasets | 2 GB / 20 GB per project | Developers |
Gemini’s file-handling capabilities are evolving quickly, driven by the growing demand for multimodal AI interactions. With its expanding compatibility across documents, spreadsheets, images, videos, and entire codebases, Gemini has become one of the most versatile assistants for professionals, researchers, and developers managing diverse datasets.
____________
FOLLOW US FOR MORE.
DATA STUDIOS

