ChatGPT 5.4 Context Window, Long Documents, File-Heavy Work, and Output Limits: What the 1M Token Model Means in the API and What ChatGPT Actually Exposes in Practice

Apr 10
9 min read

ChatGPT 5.4 context capacity cannot be described accurately with a single number, because OpenAI documents one set of limits for the underlying API model family and another set of product-managed limits for ChatGPT modes, workspaces, file workflows, and output behavior.

The most important distinction is that GPT-5.4 in the API is documented as a 1M-class context model, while ChatGPT surfaces often expose materially smaller practical limits depending on whether the user is in Thinking mode, Enterprise, Business, Projects, or standard chat.

That difference is what shapes real long-document work, file-heavy workflows, and response-length expectations, because the model’s raw theoretical capacity is not the same thing as the amount of text ChatGPT will actively stuff into context, preserve across a workflow, or emit in one answer.

·····

The official GPT-5.4 model context in the API is much larger than the practical context windows documented for ChatGPT product surfaces.

OpenAI’s API documentation describes GPT-5.4 as supporting a 1M token context window, and the model page lists GPT-5.4 and GPT-5.4 Pro in the 1.05M context class, which places them in the category of ultra-long-context models intended for very large prompts, long codebases, long document sets, and extended agent trajectories.

That number is the cleanest answer when the question is strictly about the model in the API.

It is not the cleanest answer when the question is about ChatGPT as a product, because ChatGPT layers additional product logic, plan logic, and mode-specific constraints on top of the underlying model family.

This is why discussions of “ChatGPT 5.4 context window” often become confusing.

People cite the 1M token API figure as if it directly describes every ChatGPT workflow, even though OpenAI’s help documentation shows that different ChatGPT surfaces expose smaller windows or separate operational limits for uploads, projects, and plan-specific usage.

........

API Model Context and ChatGPT Product Context Are Not the Same Limit

Surface	Officially Documented Context Position
GPT-5.4 API	1M token context window
GPT-5.4 / GPT-5.4 Pro model pages	1.05M context class
ChatGPT Enterprise and Edu GPT-5.4 Thinking	196K context window
ChatGPT Business Thinking and Pro	196K context window
ChatGPT manual Thinking in release notes	256K total context window with 128K input and 128K max output

·····

ChatGPT product limits are mode-specific and workspace-specific rather than simple pass-throughs of the raw model maximum.

OpenAI’s ChatGPT help pages show that context availability changes depending on product tier and mode, which means a user’s practical long-context capacity depends not only on the selected model family but also on whether the session is running in a business workspace, an enterprise workspace, or a specific ChatGPT interaction mode.

ChatGPT Business currently documents 32K context for Instant and 196K for Thinking and Pro, while ChatGPT Enterprise and Edu document GPT-5.4 Thinking at 196K context, making it clear that the user-facing ChatGPT product does not simply expose the raw 1M-class API number as a universal default.

OpenAI’s release notes add another important detail by stating that manual Thinking in ChatGPT was expanded to a total context window of 256K, split into 128K input and 128K max output, which means at least some ChatGPT modes now have a larger managed window than older product defaults while still remaining far below the API model’s 1M-class ceiling.

That tells us something important about product design.

ChatGPT is being managed as a set of curated interaction modes rather than as a bare-metal interface to the full model specification, which means the practical ceiling users encounter in normal work is governed by product decisions, not just by the underlying model architecture.

·····

Output limits matter as much as input limits when the task involves long rewrites, exhaustive extraction, or large synthesized reports.

OpenAI’s release notes for ChatGPT manual Thinking are unusually explicit in stating both sides of the window, namely 128K input and 128K max output inside a 256K total context window, which is important because it shows that long-context capacity is not only about how much the model can read but also about how much it can emit in a single answer.

That distinction becomes critical in document-heavy work.

A model may be able to ingest a very large source file, multiple files, or a very long conversation history, while still being unable to return a complete transformed version of all of that material in one completion because the output ceiling is smaller than the total amount the model can understand.

OpenAI’s model pages for GPT-5.4 Mini and GPT-5.4 Nano also document 128K max output tokens, reinforcing the broader point that even in large-context families, output is a separate practical constraint that can bottleneck workflows such as full-document rewrites, full structured extraction, large code transformations, or long multi-section drafting in one pass.

The practical question therefore is never just whether ChatGPT 5.4 can read the material.

It is also whether the specific ChatGPT mode can emit the required result in one response without truncation, forced summarization, or staged generation across multiple turns.

........

Why Output Limits Become the Real Bottleneck in Long-Context Work

Workflow Type	Main Constraint
Long source review	Input context and retrieval quality
Full-document rewrite	Output ceiling as much as input capacity
Exhaustive extraction across many pages	Output ceiling and formatting length
Multi-file synthesis report	Both retrieval quality and output ceiling
Large codebase transformation	Context size, output size, and staged generation needs

·····

Long-document work and file-heavy work are related problems, but OpenAI’s documentation shows that they are not the same problem.

A single very long document mainly stresses the model’s context window, output limit, and ability to reason coherently across long spans of text, because the central challenge is whether the system can absorb enough of that document at once and still produce a sufficiently complete answer.

A file-heavy workflow stresses something different.

It depends on project-level file caps, simultaneous upload limits, selective incorporation of file contents, and the orchestration logic that decides how much of each uploaded file is actively stuffed into the model context and how much is instead retrieved or summarized when needed.

This is one of the most important distinctions for serious users.

Someone asking whether ChatGPT 5.4 can handle one long book-length document is asking a different capacity question from someone asking whether ChatGPT can coordinate work across dozens of PDFs, spreadsheets, presentations, and attachments inside a project.

The first problem is primarily about context and output ceilings.

The second is primarily about file architecture, workspace limits, and the product’s strategy for incorporating file content into the live reasoning process.

·····

OpenAI explicitly says that uploaded file text is not always fully stuffed into the model context.

OpenAI’s guidance for optimizing file uploads in ChatGPT Enterprise states that even when the model has a maximum context window, not all uploaded text is necessarily used to incorporate the full contents of uploaded files into that active context, and the amount of text actually stuffed varies by usage type.

That statement is crucial because it corrects one of the most common assumptions about file uploads.

Uploading a very large document or many files does not mean the entire combined text automatically sits inside the prompt at once in the live model context.

Instead, ChatGPT file handling appears to operate through a managed strategy that combines context stuffing, selective use of uploaded content, and workflow-specific orchestration rather than simple wholesale injection of every token from every file.

This means that long-document performance in ChatGPT is partly a model question and partly a product behavior question.

A large context window helps, but the actual experience depends on how ChatGPT decides to incorporate the uploaded material, which means raw context size alone does not fully predict how well a file-heavy or long-document workflow will behave.

·····

File-heavy work is governed by project and upload limits that can be more restrictive than raw model context.

OpenAI’s Projects documentation says Free users can have 5 files per project, Go and Plus users can have 25 files per project, and Edu, Pro, Business, and Enterprise users can have 40 files per project, while only 10 files can be uploaded at the same time.

Those limits matter more than the model’s token ceiling when a workflow depends on many separate assets.

A user may be nowhere near the theoretical maximum prompt size and still run into practical constraints simply because the project structure caps how many files can live inside the workspace or how many can be uploaded in one batch.

Custom GPTs are governed by another file system.

OpenAI’s documentation for creating and editing GPTs says a custom GPT can have up to 20 attached knowledge files and that each file can be up to 512 MB, which means the file limits for GPT knowledge attachments do not match the file limits for Projects in ChatGPT.

This is another reason the phrase “ChatGPT 5.4 file capacity” can mislead.

There is no one universal file limit because Projects, Custom GPT knowledge, and ordinary chat uploads operate under different product rules.

........

ChatGPT File Limits Depend on Product Surface

Product Surface	Officially Documented File Limits
Projects on Free	5 files per project
Projects on Go and Plus	25 files per project
Projects on Edu, Pro, Business, and Enterprise	40 files per project
Simultaneous project uploads	10 files at one time
Custom GPT knowledge attachments	20 files, up to 512 MB each

·····

The practical ceiling for long documents is shaped by both context management and staged reasoning.

OpenAI’s API materials position GPT-5.4 as a model designed for very large contexts such as whole codebases, large document collections, and long agent traces, which means the underlying model family is clearly built for far longer spans of context than older mainstream models.

But in ChatGPT, the user interacts with a managed product that often breaks real work into phases.

A document may be uploaded, partially stuffed, selectively referenced, summarized, and then transformed through one or more turns rather than being handled as a single monolithic prompt-response cycle.

That means the effective long-document experience depends on more than the model’s memory span.

It also depends on whether the task is being executed in a mode with a larger managed window, whether the workspace plan allows enough file capacity, whether the document text is fully or partially incorporated, and whether the desired result fits inside the response ceiling without requiring staged output.

In other words, the real practical ceiling is not a single static number.

It is the result of how model capacity, product orchestration, file architecture, and output limits interact in the specific workflow the user is trying to run.

·····

File-heavy work is increasingly an orchestration problem rather than just a memory problem.

When users work with many files, the central challenge is not only whether the model can theoretically accept a very large number of tokens, but whether the product can reliably coordinate retrieval, file selection, summarization, and synthesis across separate artifacts while staying inside project caps and answer-length limits.

This is where the API and ChatGPT begin to diverge most clearly.

The API offers the more direct path to constructing extremely large prompts and controlling exactly what goes into the context window, while ChatGPT offers a more managed workflow in which OpenAI controls more of the context assembly and file incorporation logic.

That does not mean one is always better than the other.

It means they are optimized for different use cases.

The API is better suited to users who need deterministic control over prompt composition and large-scale structured input handling, while ChatGPT is better suited to interactive use cases where product-managed orchestration is more valuable than direct low-level control. This is an inference from the documented differences between the API model guides and ChatGPT help pages, rather than a sentence OpenAI states explicitly in those exact words.

·····

The output ceiling can quietly force multi-step workflows even when the model can read everything.

One of the most important practical consequences of the 128K max-output figure documented for manual Thinking is that users may need to split large deliverables into stages even when the model has no difficulty understanding the full source material.

That affects tasks such as rewriting very long reports, producing exhaustive line-by-line extractions, generating complete transformed versions of long source documents, or drafting very large structured deliverables from many files in one shot.

The constraint is subtle because it is easy to focus on the size of the input and forget that any complete answer must still fit into the maximum generated output.

For many professional workflows, that output boundary is the point at which the system shifts from “one pass” work to “phased” work.

This is why long-context claims can sound more powerful than the actual user experience feels.

The model may indeed be able to comprehend the full working set, but the final deliverable may still need to be broken into sections, batches, or iterative steps in order to fit inside the available output budget.

........

The Most Important Limits Depend on the Type of Work

Type of Work	The Main Practical Limit
One long source document	Context window and answer length
Many uploaded files in a project	File caps and selective incorporation
Full-document rewrite	Output ceiling
Large document set synthesis	Retrieval quality plus output ceiling
Deterministic large-prompt workflows	API-side prompt construction control

·····

The cleanest factual conclusion is that ChatGPT 5.4 long-context work is shaped by product architecture as much as by model architecture.

OpenAI’s official materials support a very clear distinction.

GPT-5.4 in the API belongs to the 1M to 1.05M context class, while ChatGPT product surfaces document smaller managed windows such as 196K in some enterprise and business settings and 256K total context with 128K input plus 128K max output in manual Thinking.

OpenAI also makes clear that file handling in ChatGPT is selective rather than a guarantee of full prompt stuffing, and that project and knowledge-file caps vary by surface, which means file-heavy work is governed by workspace rules and orchestration logic as much as by the raw model context ceiling.

The result is that “ChatGPT 5.4 context window” is only a partial description of real capacity.

For long documents, the key variables are input context, selective file incorporation, and output ceiling.

For file-heavy work, the key variables are project file caps, simultaneous upload limits, retrieval behavior, and how much of the uploaded material is actually active in the live reasoning process.

The most accurate summary is therefore not that ChatGPT 5.4 has one context window.

It is that GPT-5.4 provides a very large model context in the API, while ChatGPT exposes smaller, mode-managed, and workflow-dependent capacities that make long-document work and file-heavy work feel like related but fundamentally different kinds of scale.

·····

DATA STUDIOS

·····

[datastudios.org]

·····