“Copilot Vision” Now Available on Windows: AI That Understands Your Screen in Real Time

Graziano Stefanelli
Jun 16
4 min read

Microsoft has officially launched Copilot Vision, a new AI-powered feature designed to enhance productivity on Windows by allowing the Copilot assistant to visually interpret what’s displayed on your screen. This marks a major step in Microsoft’s strategy to evolve Copilot from a text-based chatbot into a fully multimodal digital assistant that understands images, apps, documents, and user intent in real time.

Full Timeline: How Copilot Vision Evolved from Labs to Desktop

Microsoft first introduced Copilot Vision in December 2024, quietly embedding it into Copilot Labs for Edge.

Early testers with Copilot Pro access could use the feature to analyze webpage screenshots and receive AI-generated feedback. This limited preview laid the groundwork for what would become a far more comprehensive rollout.

In March 2025, Microsoft expanded the scope to include Android camera-based input for Copilot Pro users and also enabled free Edge browser access to Vision in the United States. This version allowed users to snap photos of real-world objects or page contents and receive AI-generated explanations, translations, or step-by-step guidance.

Finally, on June 12, 2025, Microsoft launched Copilot Vision for all Windows 10 and Windows 11 users in the United States. The tool is now freely available—no Copilot Pro subscription required—and it introduces a new productivity feature called Highlights, which overlays contextual visual cues directly on your desktop screen.

Core Features of Copilot Vision on Windows

1. Visual Understanding of Apps and Content

Copilot Vision allows users to share what they are seeing on screen by clicking a glasses icon in the Copilot sidebar. Once activated, Copilot gains visual access to the selected windows (up to two at once) and interprets the layout, content, and interactive elements. The assistant can then offer intelligent suggestions, guidance, or insights based on what’s visible.

This functionality transforms Copilot from a passive tool into an active collaborator that understands not only text input, but also visual context. For instance, it can help users navigate software interfaces, correct errors, translate foreign-language text, compare spreadsheet data, or summarize PDFs—without requiring the user to explain what they’re looking at.

2. “Highlights” — Context-Aware Guidance

The Highlights feature draws attention to specific elements on your screen by outlining them or suggesting next steps. If you're unsure what to click next in a software workflow, Copilot can literally “point” to the right button or field with a visual indicator and explain the reasoning. This makes it particularly valuable for complex applications or unfamiliar tools.

Microsoft has confirmed that Highlights currently supports dual-window interaction, meaning the assistant can analyze two applications simultaneously—such as comparing numbers in Excel with those in a Word report, or aligning a design in PowerPoint with content in OneNote.

3. Voice-Powered Interaction

Vision is tightly integrated with Copilot Voice, allowing users to speak natural-language prompts like “What does this warning mean?” or “Rewrite this paragraph for a customer.” The system responds using one of four available synthetic voices, supporting more than 40 languages. This functionality enables hands-free interaction and further expands accessibility for users with vision or mobility impairments.

Privacy, Data Handling, and Enterprise Controls

Unlike Microsoft’s experimental “Recall” feature (which was paused amid controversy), Copilot Vision is fully opt-in. Users must manually activate the feature, and it only processes the content of windows that are explicitly shared. When the session ends, the AI loses visibility.

At launch, screen data is processed in Microsoft Azure—not stored or retained—and enterprise users retain full control through Microsoft 365 security and governance policies. Admins can:

Disable Copilot Vision at the tenant level
Limit its use to specific approved applications
Require user confirmation before each use

Future iterations on “Copilot-plus PCs” will offload many Vision tasks to the local NPU (neural processing unit), reducing latency and improving privacy by keeping data on-device.

Availability and Expansion Plans

As of June 2025, Copilot Vision is available to all Windows 10 and Windows 11 users in the United States at no additional cost. Support in non-European countries such as Canada, Japan, Australia, and India is scheduled to follow in the coming weeks.

The European Union rollout is currently on hold due to regulatory scrutiny under the Digital Markets Act and the upcoming EU AI Act. Microsoft has confirmed that localized testing is underway, and final availability will depend on ongoing regulatory approval.

How Copilot Vision Stands Out

Multimodal Functionality: It combines visual perception, voice input, and language understanding in a single assistant.
Cross-App Contextual Reasoning: Users can interact with multiple applications simultaneously and receive synthesized feedback.
Guided Visual Instruction: Unlike most voice assistants, Copilot Vision literally shows users what to do next via on-screen overlays.
Enterprise-Ready Security: Microsoft’s existing data protection policies are fully enforced, and usage is tightly scoped.

Key Differences from Previous Reports

Dual-window support: Confirmed for all users at launch.
Voice + visual synergy: Real-time verbal and on-screen assistance now works simultaneously.
Hardware roadmap: Copilot Vision will shift to local processing on next-gen Windows AI PCs.
Enterprise integration: Advanced controls now available for IT departments.

__________

DATA STUDIOS

datastudios.org