ChatGPT‑5: Full Report on Models, Features, Capabilities, Performance, Pricing, and more

Aug 17, 2025
7 min read

OpenAI released GPT‑5 on 7 August 2025, making the model available to all ChatGPT users and to developers via the API. OpenAI’s CEO Sam Altman described the model as “a significant step along the path to AGI” and said it feels like talking to a PhD‑level expert. GPT‑5 replaces the separate reasoning and non‑reasoning models used in GPT‑4; it is a unified system that automatically routes queries between a fast “main” model and a deeper “thinking” model. Several independent reviewers have described it as being more reliable and versatile than previous versions. This report summarises publicly available information on GPT‑5’s architecture, capabilities, benchmarks, pricing and safety mechanisms.

Release timeline and model family

Date/Event	Key information
7 Aug 2025 – Public release	OpenAI rolled out GPT‑5 to all ChatGPT tiers (free, Plus, Pro and Teams) and to developers via its API. According to The Verge, the model is presented in ChatGPT as a single option and uses a router to decide when to invoke deeper reasoning.
Model variants	GPT‑5 is offered in three sizes in the API: gpt‑5, gpt‑5‑mini and gpt‑5‑nano. Each supports multiple “reasoning levels” (minimal/low/medium/high) and a context window up to 400 k tokens. Free ChatGPT users access GPT‑5 and GPT‑5‑mini, while paid tiers can access the thinking variants and the high‑powered GPT‑5‑Pro.
Deprecated models	The new family replaces earlier OpenAI models. Simon Willison notes that GPT‑4o is mapped to gpt‑5‑main, the o3 series becomes gpt‑5‑thinking, and the GPT‑4.1‑nano corresponds to gpt‑5‑thinking‑nano.
Knowledge cut‑off	GPT‑5’s main model has a knowledge cutoff on 1 Oct 2024; the mini and nano variants have cut‑offs on 31 May 2024.

Architecture and technical design

Unified system and router

GPT‑5 changes the way OpenAI’s models operate. Previous releases (e.g., GPT‑4o and the o3 reasoning model) required users to choose between a faster model and a more capable reasoning model. GPT‑5 instead uses a real‑time router that inspects each query’s complexity, conversational context and explicit user cues (e.g., “think hard about this”) and automatically selects between a fast gpt‑5‑main model and a deeper gpt‑5‑thinking model. This design lets ChatGPT respond quickly to simple queries yet “think hard” when necessary without manual switching.

Context window

OpenAI lists different context limits depending on where GPT‑5 is used. In ChatGPT, the context window is 8k tokens for free users, 32k for Plus users and 128k for Pro users. The API exposes gpt‑5, mini and nano models that accept 272 k input tokens and can generate up to 128 k output tokens, giving a 400 k‑token context window. Wired also notes that ChatGPT’s interface uses a 256 k context, up from 200 k in the o3 model; this refers to the built‑in variant accessible in the chat interface and not the full API capacity.

Multimodality and tool integration

GPT‑5 accepts text and image inputs and can call external tools. The API includes parameters for verbosity and reasoning effort to control how much the model thinks and how long its responses are. Developers can also send raw text to tools without requiring JSON wrappers and can enforce output formats using context‑free grammars. GPT‑5 is designed for agentic workflows: it can chain multiple tool calls and maintain reasoning between steps using a new Responses API, enabling complex tasks like building applications or querying data to be completed end‑to‑end.

Training methodology and safety design

OpenAI has not disclosed GPT‑5’s parameter count or full training corpus. However, the company says the model was trained on public internet data, third‑party data sources and content created or curated by human trainers, with advanced filtering to reduce personal information. It uses reinforcement learning and synthetic data generation to improve reasoning and reliability. The knowledge cut‑off of October 2024 means the model does not incorporate events after that date unless browsing is enabled.

Safe‑completions and sycophancy reduction

GPT‑5 introduces safe‑completions to handle sensitive “dual‑use” topics (such as biology or cybersecurity). Instead of either refusing or fully complying with such queries, the model provides helpful high‑level responses while avoiding actionable details. OpenAI also post‑trained GPT‑5 to reduce sycophancy (the tendency to be overly agreeable). The system card (summarised by Simon Willison) notes that the company evaluated model responses for sycophancy and used these scores as a reward signal to train the model. In Revolgy’s summary, gpt‑5‑main shows a 69 % decrease in sycophancy for free users and 75 % for paid users compared to GPT‑4o.

Safety testing and red‑teaming

GPT‑5 underwent extensive testing. The system card reports over 5,000 hours of red‑teaming and additional testing with external organizations such as Microsoft’s AI Red Team and the UK AI Safety Institute. OpenAI’s preparedness framework labels gpt‑5‑thinking as “High capability” in biological and chemical domains; this triggers additional safeguards and monitoring. To detect harmful content, the system uses a two‑step classifier and reasoning mechanism and can enforce account‑level bans for misuse. Despite improvements, prompt injection remains an unsolved issue; tests show that even gpt‑5‑thinking has a 56.8 % attack success rate in red‑team evaluations.

Capabilities and performance

Reasoning and coding

OpenAI claims GPT‑5 is “smarter, faster and more accurate” than prior models. Independent testers corroborate these improvements:

Coding: GPT‑5 achieves 74.9 % on the SWE‑Bench Verified benchmark, outperforming GPT‑4.1 (≈54 %) and edging out Anthropic’s Claude Opus 4.1. It also scores 88 % on the Aider Polyglot benchmark for code editing and 89.4 % on the GPQA Diamond science reasoning benchmark. The model can generate complete web applications or games from a single prompt, demonstrated in live demos where GPT‑5 produced a French‑learning website and a financial dashboard within minutes.
Mathematics: On the AIME 2025 math contest benchmark, GPT‑5 solves 94.6 % of problems.
Health and science: GPT‑5‑thinking scores 46.2 % on HealthBench Hard, up from 31.6 % for the o3 model. Revolgy notes that gpt‑5‑thinking produces 20 % fewer factual errors than GPT‑4o, and gpt‑5‑main reduces major factual errors by 44 %.
Context and multilingual performance: The models can process long contexts (up to 400 k tokens in the API) and maintain high accuracy (>90 %) on 128 k‑token retrieval tasks. GPT‑5 matches state‑of‑the‑art systems on multilingual MMLU benchmarks across 13 languages.

Comparison with previous and competing models

Model	Input tokens (API)	Output tokens	Key strengths	Source
GPT‑5 (full)	272 k	128 k	Highest reasoning depth; best coding/health benchmarks; agentic tool use	OpenAI via Simon Willison & Neoteric
GPT‑5‑mini	272 k	128 k	Balanced speed and quality; suitable for everyday drafting and mid‑complexity tasks	Neoteric
GPT‑5‑nano	272 k	128 k	Ultra‑cheap, high‑throughput generation for templated tasks	Neoteric
GPT‑4o	200 k (o3: 200 k)	100 k (approx.)	Fast multimodal responses; replaced by gpt‑5‑main	Wired & Simon Willison
Anthropic Claude Opus 4.1	–	–	Strong reasoning but higher cost; 74.5 % on SWE‑Bench Verified	Neoteric
Gemini 2.5 Pro	–	–	Competitive model from Google; 59.6 % on SWE‑Bench Verified	Neoteric

New API features and developer controls

Verbosity & reasoning_effort: Developers can set the verbosity to choose terse, balanced or expansive answers and adjust the reasoning effort to control how much the model thinks.
Free‑form function calling: The API accepts raw text (e.g., Python code or SQL queries) for custom tools and can enforce output formats via context‑free grammars.
Responses API: Maintains reasoning state across tool calls, improving multi‑step tasks.
Reasoning traces: Developers can request reasoning summaries via the responses API, enabling some transparency into the model’s internal chain of thought.

Pricing and access

OpenAI uses tiered pricing for GPT‑5. According to Wired, API pricing is $1.25 per million input tokens and $10 per million output tokens for gpt‑5; $0.25/ $2.00 for gpt‑5‑mini; and $0.05/ $0.40 for gpt‑5‑nano. Neoteric’s breakdown emphasises the trade‑offs: the full model is best for complex tasks, mini for balanced everyday work, and nano for high‑volume templated generation.

In ChatGPT, access is governed by subscription tiers:

Free – Limited number of GPT‑5 messages (10 messages every 5 hours) and one thinking message per day.
Plus (US $20/month) – Up to 80 GPT‑5 messages every 3 hours and 200 thinking messages per week.
Pro (US $200/month) – Unlimited access to GPT‑5 and GPT‑5 Thinking with an expanded context window.

Use cases and demonstrations

Software development and automation

Demonstrations during the launch showed GPT‑5 building complete web applications and games from single prompts, including a French‑learning site with quizzes and an interactive financial dashboard. The model can plan tasks, write code, run builds, debug errors and iterate, effectively performing end‑to‑end software creation. The concept of “software on demand” is emphasised by Sam Altman and early testers.

Health and scientific research

GPT‑5 is described as an “active thought partner” for medical queries. It uses a new HealthBench evaluation with physician‑generated questions; GPT‑5‑thinking scored 46.2 %, vastly outperforming GPT‑4o’s 0 % in that benchmark. The system card indicates it is OpenAI’s “best model yet for health‑related questions”.

Creative writing and entertainment

GPT‑5 shows improved performance in creative tasks such as stand‑up comedy, rap battles, haiku and poetry. Testers note stronger emotional impact and clearer imagery in generated content. New ChatGPT features allow users to choose one of four preset personalities—Cynic, Robot, Listener and Nerd—and to select chat thread colours. These features aim to personalise interactions.

Personal assistant integration

OpenAI plans to integrate ChatGPT with Gmail, Google Contacts and Calendar. Pro users will receive this integration first, with other tiers to follow. During the launch, GPT‑5 scheduled a marathon training run, responded to emails and created a travel packing list, illustrating its potential as a personal assistant.

Limitations, criticisms and unresolved issues

Not AGI yet: Sam Altman emphasised that GPT‑5 does not constitute artificial general intelligence; the model still cannot learn continuously and lacks key traits required for AGI.
Prompt injection: Despite improvements, the model remains vulnerable. Red‑team assessments reported a 56.8 % attack success rate for gpt‑5‑thinking, highlighting that prompt injection remains an open research problem.
Long‑context degradation and cost: Although the API can handle 400 k‑token inputs, ChatGPT restricts the context window to lower values for free and Plus tiers due to latency and cost concerns. OpenAI hints that larger windows may become available to enterprise customers.
Opaque training data: OpenAI provides only high‑level descriptions of the training data. Without transparency on sources and parameter counts, it is difficult for external researchers to evaluate potential biases and data quality.
Safety vs usefulness trade‑offs: Safe‑completions and reduced sycophancy might sometimes result in less detailed or more cautious responses, potentially limiting utility in certain domains.
Cost and accessibility: The Pro tier’s price of $200/month may limit access for individuals and small organisations. Additionally, the full 400 k context window is only available via the API, which incurs token‑based charges.

____________

DATA STUDIOS

datastudios.org