Microsoft Copilot context window: token limits, memory policy, and 2025 rules.

Graziano Stefanelli
Aug 11
4 min read

Copilot’s context window depends on the model it selects and the orchestration layer.

Microsoft Copilot doesn’t use a fixed model or static token limit—it dynamically chooses the best model for each task. In 2025, Copilot across Microsoft 365 and the web is powered primarily by GPT-4o, with ongoing migration to GPT-5.

These models enforce technical token windows: for GPT-4o, up to 128,000 tokens input and 16,384 tokens output per request. Copilot also uses an orchestration layer (like Microsoft Graph, Bing, or Prometheus) that adds system messages and retrieved content—these all consume tokens within the overall limit.

The model behind Copilot determines the maximum window for each prompt.

If Copilot uses GPT-4o, your prompt and its completion must together stay under the 128k token budget. When GPT-5 is used (now selectively available in some interfaces), the context capacity is expected to be equal or larger, though no official limit is disclosed for it. In most user environments, Copilot decides the model internally—you don’t pick it—and token accounting follows the active model’s structure, including tokens spent on summarization, tool calling, and grounding.

Token limits are not always visible, but practical thresholds exist.

Copilot does not display token counters, but practical limits show up through input restrictions or failure to generate long responses.Reports from users and Microsoft documentation suggest varying limits depending on where and how you use Copilot:

Copilot on the web or via Microsoft Edge: reported prompt limits of 4k–8k characters, with higher limits (up to 16k) in some rollout waves.
Copilot in Teams/Outlook Chat: supports longer inputs and outputs in document-oriented tasks.
Copilot for Microsoft 365 (Enterprise): follows model limits but also leverages internal knowledge from Microsoft Graph, reducing the need for verbose prompts.

Uploads, documents, and file context count toward the window.

When uploading files into Copilot (web or 365), users are generally limited to 512 MB per file. On the free Copilot web version, limits of 3 files per 24 hours have been reported.Even though Copilot can index full documents, only the specific passages retrieved and referenced during a response count toward the active context window.In 365 environments, files and emails retrieved via Graph grounding act similarly—only the quoted and resolved fragments take up tokens.

Copilot for Microsoft 365 includes graph data, but context limits still apply.

Copilot in enterprise environments accesses your documents, emails, meetings, chats, and more through Microsoft Graph. This allows it to deliver personalized and context-aware responses.However, this does not expand the LLM’s context window; it merely selects and injects relevant pieces of data into the prompt.Each grounded reference still takes up tokens and must fit within the model’s overall capacity, so precision in task definition remains critical.

Copilot has memory-like features, but they don’t raise the token cap.

Microsoft Copilot remembers your previous turns within a session and may apply preferences or context from enterprise profiles. However, it does not persist memory across sessions unless embedded in projects or shared data.In enterprise settings, organizations can govern memory and retention with tools like Microsoft Purview, ensuring prompts and generated outputs comply with organizational policy.

Context is lost when too much content is packed into one request.

If your combined input + expected output + system instructions + retrieved documents exceed the model’s limit, Copilot may truncate earlier turns, shorten responses, or fail to generate output altogether.Symptoms include replies that ignore earlier constraints, repeat instructions, or omit important references. In web and app versions, a too-long prompt may trigger errors or forced rephrasing. In 365 apps, the assistant might skip previous references or return partial completions.

Content length recommendations help stay within safe bounds.

Microsoft’s guidance includes approximate planning ranges based on pages or word count, which translate indirectly into tokens:

Document summarization: works best for files up to ~300 pages.
Asking questions on content: < 7,500 words (~10k tokens).
Rewriting or generating new text: < 3,000 words (~4k tokens).

These are not fixed limits but practical recommendations to ensure accuracy and completeness within the token budget.

Retention and compliance depend on your Copilot environment.

In consumer or free Copilot experiences, your conversations may be retained and reviewed by Microsoft to improve model quality. Gemini-like activity settings are in place to allow users to turn off history, though some retention (up to 72 hours) is required for functionality.In Microsoft 365 enterprise environments, content is subject to organizational policies, and IT administrators can configure zero data retention, compliance journaling, and audit trails using tools like Purview and Entra ID.

Planning within Copilot's token window requires balancing retrieval and output.

Minimize prompt bloat: don't paste entire documents; ask targeted questions.
Avoid repeated content: rephrasing or reloading old instructions consumes more tokens.
Structure inputs clearly: segment instructions from background context.
Use Copilot’s grounding: let it find what it needs instead of forcing all data into the prompt.
Expect truncation at scale: GPT-4o and GPT-5 both follow hard input-output caps.

Copilot’s intelligence relies on orchestration, but tokens still define its limits.

Behind the scenes, Copilot coordinates large models, enterprise data, and retrieval systems—but the language model still has a strict context window. Whether it’s 128k in GPT-4o, something higher in GPT-5, or less in constrained consumer apps, the rule remains: everything counts—your prompt, the assistant’s response, and what it sees from the world around it.Managing this budget effectively is what makes Copilot feel coherent across long tasks—or lose track when overfilled.

____________

DATA STUDIOS

datastudios.org