Microsoft Copilot Context Window: Maximum Token Limits, Context Retention, Conversation Length, And Memory Handling

5 hours ago
5 min read

Microsoft Copilot’s context management and memory features are central to how the assistant interprets user input, maintains relevance, and delivers responses that draw on chat history, organizational data, and personalized facts. The structure of the Copilot context window and its retention mechanisms differ by product, underlying model, and the orchestration that governs the integration of retrieved content, chat history, and explicit memory.

·····

Microsoft Copilot uses layered and variable context windows across its surfaces.

The context window in Microsoft Copilot does not operate on a single, universal token limit. Each Copilot experience assembles its working context from the user’s current prompt, system instructions, chat history, organizational snippets retrieved from sources like Microsoft Graph, and the generated response. Copilot for consumers on copilot.microsoft.com, Microsoft 365 Copilot inside Word, Excel, Outlook, Teams, Copilot Studio for custom agents, GitHub Copilot Chat, and embedded Copilot instances in products like Microsoft Fabric all manage their own context windows, which are shaped by the chosen language model and the orchestration layer of each product surface.

Token limits are explicit only in developer-facing environments or where model choices are user-selectable. In most enterprise and consumer applications, Copilot enforces effective limits by truncating old conversation turns, summarizing long threads, and selectively retrieving the most relevant data from connected sources.

·····

Maximum token limits are only published for select Copilot products and models.

In Copilot Studio, model selection controls expose the precise token window for each model option, ranging from 128,000 tokens to 400,000 tokens for advanced reasoning models, with some third-party models listed at 200,000 tokens. These values define the sum of all prompt content, chat history, system prompts, retrievals, and the model’s output.

GitHub Copilot Chat publishes a 64,000-token window for users working with advanced models such as GPT-4o, supporting deep code context and long, multi-file conversations. This window applies consistently across GitHub’s website, IDE integrations, and mobile or CLI tools.

Microsoft 365 Copilot and consumer Copilot products do not share a single published token window. Instead, their orchestration layers dynamically assemble the working prompt, prune less relevant content, and inject the most pertinent organizational or document snippets.

........

Copilot context window reference table

Surface	Maximum Token Limit	Context Composition	Additional Constraints
Copilot Studio	128,000–400,000 tokens	Instructions, chat, retrieved data, response	Execution timeouts, quotas, file limits
GitHub Copilot Chat	64,000 tokens	Chat, codebase, IDE state, snippet context	Multi-file, repo, and code navigation
Microsoft 365 Copilot	Not published	Chat, retrievals, app context	Document-length and retrieval guidance
Consumer Copilot	Not published	Chat, summary, prompt	18-month conversation history retention
Embedded Copilot	Not published	Chat, metadata, retrieval	Grounding data, product rules

·····

Microsoft 365 Copilot applies document-length guidance and selective retrieval in place of fixed token limits.

In Microsoft 365 Copilot’s business apps, context assembly is guided by document size and retrieval relevance. For summarization and referencing, the recommended practical upper bound is about 1.5 million words or 300 pages. For rewriting, editing, or highly focused tasks, Microsoft advises keeping input to 3,000 words or less for best results.

The system relies on retrieval-augmented generation, where relevant segments are extracted from SharePoint, OneDrive, Exchange, or the current document. Only these extracts are injected into the working prompt, so even large documents or knowledge bases can be referenced without exceeding technical or practical token limits. This orchestration ensures Copilot’s responses remain relevant and tractable, even with massive organizational data.

·····

Retrieval and orchestration play a dominant role in shaping context and memory.

In modern Copilot environments, the token window is only one component of effective context. Copilot uses dynamic retrieval to fetch and prioritize snippets from organizational content, such as files, messages, or data connectors. The orchestrator combines the latest user message, recent chat history, and retrieved snippets, summarizing or truncating older or less-relevant material to fit within model limits.

This retrieval-centric context design allows Copilot to “remember” or reference knowledge well beyond what fits in the current chat transcript. If a user asks a follow-up that depends on enterprise data, Copilot can surface pertinent facts by querying Microsoft 365 sources, even if that information was never present in the chat history.

Conversely, when retrieval returns incomplete or off-target snippets, Copilot’s context narrows, and it may not recall facts present elsewhere in the organization.

·····

Conversation length and chat retention are governed by context windows, orchestration, and runtime policies.

The length of a Copilot conversation is constrained by the maximum working context window, orchestration rules for pruning and summarizing chat history, prompt runtime ceilings, and quotas for throughput in custom agent environments. As conversations grow, Copilot incrementally drops or summarizes the oldest exchanges to accommodate new user input, retrievals, and system instructions within the window.

In Copilot Studio and custom deployments, additional controls apply, including quotas for requests per minute, instruction length, and message throughput. Product-specific Copilot instances may also enforce their own execution timeouts or compliance-driven retention periods.

........

Copilot conversation length and retention table

Surface	Max Conversation Length	Pruning/Summarization Method	Retention Policy
GitHub Copilot Chat	64,000 tokens (rolling)	Drops/summarizes old code and chat	Per-session, product history
Copilot Studio	Model and quota-limited	Summarizes/prunes chat	Tenant logs, quotas
Microsoft 365 Copilot	Retrieval-driven	Truncates, retrieves key snippets	Purview or enterprise policy
Consumer Copilot	Not published	Summarizes, truncates, chat window	18 months default retention
Embedded Copilot	Product-dependent	Experience-driven pruning	Product/tenant compliance

·····

Memory in Copilot is a distinct personalization layer, not a substitute for full chat history.

Copilot memory, available in Microsoft 365 Copilot, stores select user facts, preferences, and recurring topics independently of chat history. Memory is curated intentionally or extracted from chats and is managed by the user, who may edit or delete memory items.

When memory is enabled, Copilot can inject these facts as part of the working context for future prompts, improving personalization and continuity without requiring all past conversation turns to remain in the context window. Memory is not governed by Microsoft Purview retention labels or enterprise compliance rules and persists until managed or deleted by the user.

The existence of memory means Copilot can reference and build upon previous user interactions and personalizations in a compact, relevant way.

·····

Conversation history and data retention policies vary by Copilot environment and compliance configuration.

The storage and retention of Copilot conversation logs differ from the system’s operational memory and context window. In consumer Copilot, conversation histories are stored for up to 18 months when users are signed in, enabling review and deletion but not guaranteeing the inclusion of all history in future prompts.

For enterprise users, retention of Copilot chat logs and data is managed by Microsoft Purview or other compliance tools, which can set automatic deletion, retention holds, or auditing requirements. Copilot memory remains outside these controls as a personalization layer.

In embedded or product-specific Copilot deployments, retention and context rules may be shaped by application needs, tenant policy, and local security governance.

·····

Enterprise, developer, and embedded Copilot environments rely on a combination of token windows, retrieval limits, orchestration, and quotas.

In Copilot Studio and embedded Copilot products, model context windows are enforced alongside orchestration and runtime controls. The context for each prompt is assembled from current user input, relevant history, retrievals from connected content, and product metadata. Additional limits, such as execution timeouts, request quotas, file size caps, and security protocols, further shape the real-world capacity for memory and context in these environments.

Microsoft’s design allows Copilot to adapt to high-volume, large-scale enterprise use cases while remaining secure and compliant with organizational policies and user privacy requirements.

........

Summary table: Microsoft Copilot context, memory, and retention facts

Dimension	Microsoft 365 Copilot	Copilot Studio	GitHub Copilot Chat	Consumer Copilot
Max context window	Retrieval and orchestration, ~300 pages practical	128K–400K tokens (model)	64K tokens (GPT-4o)	Not published
Chat history retention	Enterprise and compliance policy	Tenant logs and quotas	Session and product policy	18 months (default)
Memory support	Yes, user-managed	Agent configuration	Not primary	Not primary
Retrieval/grounding	Yes, via Microsoft Graph	Yes, per agent	Yes, for code/repo	Limited
Throttling/quotas	Yes, enterprise and product	Yes, runtime, tokens	Session-based	Not published

·····

DATA STUDIOS

·····

[datastudios.org]

·····