top of page

Microsoft Copilot Context Window: Maximum Token Limits, Context Retention, Conversation Length, And Memory Handling

  • 5 hours ago
  • 5 min read

Microsoft Copilot’s context management and memory features are central to how the assistant interprets user input, maintains relevance, and delivers responses that draw on chat history, organizational data, and personalized facts. The structure of the Copilot context window and its retention mechanisms differ by product, underlying model, and the orchestration that governs the integration of retrieved content, chat history, and explicit memory.

·····

Microsoft Copilot uses layered and variable context windows across its surfaces.

The context window in Microsoft Copilot does not operate on a single, universal token limit. Each Copilot experience assembles its working context from the user’s current prompt, system instructions, chat history, organizational snippets retrieved from sources like Microsoft Graph, and the generated response. Copilot for consumers on copilot.microsoft.com, Microsoft 365 Copilot inside Word, Excel, Outlook, Teams, Copilot Studio for custom agents, GitHub Copilot Chat, and embedded Copilot instances in products like Microsoft Fabric all manage their own context windows, which are shaped by the chosen language model and the orchestration layer of each product surface.

Token limits are explicit only in developer-facing environments or where model choices are user-selectable. In most enterprise and consumer applications, Copilot enforces effective limits by truncating old conversation turns, summarizing long threads, and selectively retrieving the most relevant data from connected sources.

·····

Maximum token limits are only published for select Copilot products and models.

In Copilot Studio, model selection controls expose the precise token window for each model option, ranging from 128,000 tokens to 400,000 tokens for advanced reasoning models, with some third-party models listed at 200,000 tokens. These values define the sum of all prompt content, chat history, system prompts, retrievals, and the model’s output.

GitHub Copilot Chat publishes a 64,000-token window for users working with advanced models such as GPT-4o, supporting deep code context and long, multi-file conversations. This window applies consistently across GitHub’s website, IDE integrations, and mobile or CLI tools.

Microsoft 365 Copilot and consumer Copilot products do not share a single published token window. Instead, their orchestration layers dynamically assemble the working prompt, prune less relevant content, and inject the most pertinent organizational or document snippets.

........

Copilot context window reference table

Surface

Maximum Token Limit

Context Composition

Additional Constraints

Copilot Studio

128,000–400,000 tokens

Instructions, chat, retrieved data, response

Execution timeouts, quotas, file limits

GitHub Copilot Chat

64,000 tokens

Chat, codebase, IDE state, snippet context

Multi-file, repo, and code navigation

Microsoft 365 Copilot

Not published

Chat, retrievals, app context

Document-length and retrieval guidance

Consumer Copilot

Not published

Chat, summary, prompt

18-month conversation history retention

Embedded Copilot

Not published

Chat, metadata, retrieval

Grounding data, product rules

·····

Microsoft 365 Copilot applies document-length guidance and selective retrieval in place of fixed token limits.

In Microsoft 365 Copilot’s business apps, context assembly is guided by document size and retrieval relevance. For summarization and referencing, the recommended practical upper bound is about 1.5 million words or 300 pages. For rewriting, editing, or highly focused tasks, Microsoft advises keeping input to 3,000 words or less for best results.

The system relies on retrieval-augmented generation, where relevant segments are extracted from SharePoint, OneDrive, Exchange, or the current document. Only these extracts are injected into the working prompt, so even large documents or knowledge bases can be referenced without exceeding technical or practical token limits. This orchestration ensures Copilot’s responses remain relevant and tractable, even with massive organizational data.

·····

Retrieval and orchestration play a dominant role in shaping context and memory.

In modern Copilot environments, the token window is only one component of effective context. Copilot uses dynamic retrieval to fetch and prioritize snippets from organizational content, such as files, messages, or data connectors. The orchestrator combines the latest user message, recent chat history, and retrieved snippets, summarizing or truncating older or less-relevant material to fit within model limits.

This retrieval-centric context design allows Copilot to “remember” or reference knowledge well beyond what fits in the current chat transcript. If a user asks a follow-up that depends on enterprise data, Copilot can surface pertinent facts by querying Microsoft 365 sources, even if that information was never present in the chat history.

Conversely, when retrieval returns incomplete or off-target snippets, Copilot’s context narrows, and it may not recall facts present elsewhere in the organization.

·····

Conversation length and chat retention are governed by context windows, orchestration, and runtime policies.

The length of a Copilot conversation is constrained by the maximum working context window, orchestration rules for pruning and summarizing chat history, prompt runtime ceilings, and quotas for throughput in custom agent environments. As conversations grow, Copilot incrementally drops or summarizes the oldest exchanges to accommodate new user input, retrievals, and system instructions within the window.

In Copilot Studio and custom deployments, additional controls apply, including quotas for requests per minute, instruction length, and message throughput. Product-specific Copilot instances may also enforce their own execution timeouts or compliance-driven retention periods.

........

Copilot conversation length and retention table

Surface

Max Conversation Length

Pruning/Summarization Method

Retention Policy

GitHub Copilot Chat

64,000 tokens (rolling)

Drops/summarizes old code and chat

Per-session, product history

Copilot Studio

Model and quota-limited

Summarizes/prunes chat

Tenant logs, quotas

Microsoft 365 Copilot

Retrieval-driven

Truncates, retrieves key snippets

Purview or enterprise policy

Consumer Copilot

Not published

Summarizes, truncates, chat window

18 months default retention

Embedded Copilot

Product-dependent

Experience-driven pruning

Product/tenant compliance

·····

Memory in Copilot is a distinct personalization layer, not a substitute for full chat history.

Copilot memory, available in Microsoft 365 Copilot, stores select user facts, preferences, and recurring topics independently of chat history. Memory is curated intentionally or extracted from chats and is managed by the user, who may edit or delete memory items.

When memory is enabled, Copilot can inject these facts as part of the working context for future prompts, improving personalization and continuity without requiring all past conversation turns to remain in the context window. Memory is not governed by Microsoft Purview retention labels or enterprise compliance rules and persists until managed or deleted by the user.

The existence of memory means Copilot can reference and build upon previous user interactions and personalizations in a compact, relevant way.

·····

Conversation history and data retention policies vary by Copilot environment and compliance configuration.

The storage and retention of Copilot conversation logs differ from the system’s operational memory and context window. In consumer Copilot, conversation histories are stored for up to 18 months when users are signed in, enabling review and deletion but not guaranteeing the inclusion of all history in future prompts.

For enterprise users, retention of Copilot chat logs and data is managed by Microsoft Purview or other compliance tools, which can set automatic deletion, retention holds, or auditing requirements. Copilot memory remains outside these controls as a personalization layer.

In embedded or product-specific Copilot deployments, retention and context rules may be shaped by application needs, tenant policy, and local security governance.

·····

Enterprise, developer, and embedded Copilot environments rely on a combination of token windows, retrieval limits, orchestration, and quotas.

In Copilot Studio and embedded Copilot products, model context windows are enforced alongside orchestration and runtime controls. The context for each prompt is assembled from current user input, relevant history, retrievals from connected content, and product metadata. Additional limits, such as execution timeouts, request quotas, file size caps, and security protocols, further shape the real-world capacity for memory and context in these environments.

Microsoft’s design allows Copilot to adapt to high-volume, large-scale enterprise use cases while remaining secure and compliant with organizational policies and user privacy requirements.

........

Summary table: Microsoft Copilot context, memory, and retention facts

Dimension

Microsoft 365 Copilot

Copilot Studio

GitHub Copilot Chat

Consumer Copilot

Max context window

Retrieval and orchestration, ~300 pages practical

128K–400K tokens (model)

64K tokens (GPT-4o)

Not published

Chat history retention

Enterprise and compliance policy

Tenant logs and quotas

Session and product policy

18 months (default)

Memory support

Yes, user-managed

Agent configuration

Not primary

Not primary

Retrieval/grounding

Yes, via Microsoft Graph

Yes, per agent

Yes, for code/repo

Limited

Throttling/quotas

Yes, enterprise and product

Yes, runtime, tokens

Session-based

Not published

·····

FOLLOW US FOR MORE.

·····

DATA STUDIOS

·····

·····

Recent Posts

See All
bottom of page