Google launches Gemini 2.5 Deep Think for AI Ultra users with parallel reasoning and 1M token context

Graziano Stefanelli
Aug 1
3 min read

A new model variant expands Gemini’s logical depth and outperforms both Grok 4 and OpenAI o3 in mathematical reasoning and code benchmarks, but remains available only to premium users on the Ultra tier.

Google has released a new experimental mode for its flagship AI model—Gemini 2.5 Deep Think—offering longer, parallelized reasoning and extended context capabilities. Integrated directly into the Gemini app (mobile and desktop), this variant is positioned above the standard Gemini 2.5 Pro and accessible only to subscribers of the Google AI Ultra plan, currently priced at $249.99 per month.

Unlike earlier iterations of Gemini, Deep Think emphasizes multi-agent coordination, longer “thinking time,” and a rigorous synthesis process that draws on multiple concurrent reasoning paths before returning an answer. This architecture reflects Google's internal research into theorem-solving and decision trees, with direct ties to the academic version of Gemini used at the 2025 International Mathematical Olympiad.

Deep Think introduces a multi-agent reasoning core with extended thinking latency.

At the core of Deep Think is a new approach to reasoning. Rather than producing a single linear output, the model internally spawns and evaluates parallel hypotheses, which are then aggregated, weighted, and synthesized into a final response. This is part of Google DeepMind’s Tree-of-Thoughts and Multi-Agent Alignment strategy—previously limited to academic or lab settings.

This process results in slower outputs (due to internal deliberation) but significantly improves results in multi-step logic, algebraic decomposition, and large-context synthesis. According to internal documentation, Deep Think uses a structured routing system that assigns queries to dedicated internal agents based on complexity and modality (text, image, audio, or code).

The model supports a full 1 million-token context window and large output generation.

Deep Think retains Gemini 2.5’s base architecture—a sparse mixture-of-experts multimodal transformer—but expands its limits significantly:

Input context: up to 1 million tokens, more than any other public LLM
Output generation: up to 192,000 tokens, optimized for document summarization, video description, and iterative coding
Modality: native support for text, image, audio, and video inputs, integrated in a fused attention block system

The model was trained on TPU v5e superclusters using Pathways and JAX, with a heavy emphasis on open scientific data and proof-based learning environments.

Benchmarks show consistent performance gains over Grok 4 and OpenAI o3.

According to data released alongside the model, Gemini 2.5 Deep Think outperforms both Grok 4 (xAI) and OpenAI o3 on a range of challenging benchmark tasks:

Benchmark	Deep Think	Grok 4	OpenAI o3
Humanity's Last Exam	34.8%	25.4%	20.3%
LiveCodeBench v6	87.6%	79.0%	72.0%
IMO 2025 (real-time variant)	🥉 Bronze	—	—

These results were achieved without external tools, indicating that the core reasoning capabilities of Deep Think are more robust than its competitors in pure logic and coding domains. In enterprise code generation and recursive problem-solving, the model also showed lower hallucination rates and better semantic coverage in multi-file scenarios.

The feature is exclusive to AI Ultra subscribers on the Gemini app.

As of August 1, 2025, Deep Think is available only to users subscribed to the AI Ultra plan, a premium tier offered through the Gemini app and Google One in more than 70 countries, including the U.S., Canada, Italy, Germany, UK, Australia, and Japan.

Once subscribed, users can toggle Deep Think via a model switcher icon next to “Gemini 2.5 Pro.” It runs directly in the mobile app (iOS and Android) and the desktop interface of gemini.google.com, and supports standard tool integrations (code interpreter, file analysis, web retrieval) when enabled.

The service currently enforces daily usage limits for prompt volume, and Google has warned of higher serving costs and slower response times compared to 2.5 Pro.

Research distribution and API access remain limited and private.

Outside the consumer Gemini app, Google is granting research access to the academic version of Deep Think—the same variant used to train for the IMO 2025 challenge—to a select group of mathematical researchers. This model has extended theorem-solving capabilities and a longer reasoning window but is not yet commercialized.

In parallel, Google is preparing a private API rollout of Deep Think for selected enterprise developers under NDA. These APIs will include “with tools” and “tool-less” variants, with different latency profiles and usage limits. No timeline has been given for broader public availability via the Gemini API console.

For now, Deep Think remains a premium consumer preview, positioned as a demonstration of what Google’s reasoning-focused architectures can achieve when scaled up under ideal conditions.

________ FOLLOW US FOR MORE.

DATA STUDIOS

datastudios.org