Meta AI: Rollout updates for advanced models and feature expansions
- Graziano Stefanelli
- 9 hours ago
- 3 min read

Meta AI has accelerated its deployment of higher-tier Llama models, introducing larger context windows, faster throughput, and expanded multimodal capabilities. The rollout strategy is structured to move from tightly controlled design-partner programs to general availability, ensuring stability and compliance while delivering performance improvements to all user tiers.
The new model rollout follows a phased approach.
Phase | Description |
Design-partner trial | Limited to fewer than 500 users, focused on high-load and complex scenario testing. |
Open beta | Available to subscription tiers with a feature toggle for two weeks. |
Phased general release | Rolled out to Plus and Enterprise first, then to the Free tier within 10–14 days. |
Enterprise enablement | Activated after SSO and encryption key validation for each organisation. |
This measured sequence reduces disruption and provides Meta with actionable telemetry from early-stage users before large-scale release.
New models bring higher limits and faster execution.
Model version | Max context | Output speed (tokens/s) | Multimodal support | Special features |
Llama 3.5 Turbo | 32,000 | 70 | Limited image beta | — |
Llama 4 Turbo | 64,000 | 92 | Full image analysis | Partial tool-calling upgrade |
Llama 4 Deep Think | 128,000 | 60 | Full image analysis, audio transcription | Chain-of-thought trace with tagging |
Llama 4 Ultra | 256,000 | — | Video + code fusion (private alpha) | Not yet released for public use |
These upgrades allow more complex projects to be handled in a single conversation, reducing the need for manual summarisation or splitting inputs into multiple sessions.
Availability varies by plan and location.
Plan | Default model | Highest model opt-in | Access notes |
Free | Llama 4 Turbo | Deep Think (10 calls/day) | Global, no cost changes |
Plus | Llama 4 Turbo | Deep Think GA | Included in subscription |
Meta AI+ | Llama 4 Turbo | Deep Think beta | 15% token surcharge for beta use |
Enterprise | Custom mix | Deep Think or Ultra | Per-organisation pricing and governance controls |
Deep Think remains opt-in for most plans, with Ultra restricted to a small number of enterprise partners.
Governance tools accompany each rollout.
Control | Purpose |
Model allow-list | Restricts available versions for a workspace. |
Spend caps | Separate token and call limits by model tier. |
Audit tagging | Flags preview calls with model stage metadata. |
Data residency | Ensures processing stays in EU, US, or APAC clusters. |
No-train setting | Prevents conversation data from being stored or reused. |
These safeguards enable organisations to adopt new models without losing oversight of usage or compliance.
Performance benchmarks show tangible improvements.
Task | Llama 4 Turbo | Llama 4 Deep Think |
PDF Q&A with 10,000 tokens and images | 1.9 s first token, 420 ms retrieval | 2.6 s first token, 390 ms retrieval |
Multi-image scene analysis | 5.3 s total | 5.0 s total |
Audio-to-summary for 2-minute clip | — | 11.8 s total |
The Deep Think model trades slightly slower first-token latency for longer memory and reasoning depth, making it more effective for extended projects.
Known issues are tracked with workarounds.
Issue | Affected model | Suggested workaround |
Memory truncation at 110,000 tokens | Deep Think | Split session and send a summarised recap |
Evening latency spikes in EU | Turbo (Free tier) | Schedule generation during off-peak hours |
Occasional JSON mismatch in tools | Turbo beta tools | Re-register schema or simplify parameters |
Meta publishes these limitations with recommended mitigation steps in the model release notes.
The roadmap points toward even larger context and customisation.
Planned upgrades include a fine-tuning toolkit for Llama 4 Turbo, enabling LoRA adaptation for up to 25 million tokens; on-device inference for Ray-Ban smart glasses using quantised Llama 3.5; and a code execution sandbox inside chats for Deep Think. These features are expected to extend both the autonomy and portability of the models in production environments.
____________
FOLLOW US FOR MORE.
DATA STUDIOS