Meta AI: Rollout updates for advanced models and feature expansions

Graziano Stefanelli
Aug 20
3 min read

Meta AI has accelerated its deployment of higher-tier Llama models, introducing larger context windows, faster throughput, and expanded multimodal capabilities. The rollout strategy is structured to move from tightly controlled design-partner programs to general availability, ensuring stability and compliance while delivering performance improvements to all user tiers.

The new model rollout follows a phased approach.

Phase	Description
Design-partner trial	Limited to fewer than 500 users, focused on high-load and complex scenario testing.
Open beta	Available to subscription tiers with a feature toggle for two weeks.
Phased general release	Rolled out to Plus and Enterprise first, then to the Free tier within 10–14 days.
Enterprise enablement	Activated after SSO and encryption key validation for each organisation.

This measured sequence reduces disruption and provides Meta with actionable telemetry from early-stage users before large-scale release.

New models bring higher limits and faster execution.

Model version	Max context	Output speed (tokens/s)	Multimodal support	Special features
Llama 3.5 Turbo	32,000	70	Limited image beta	—
Llama 4 Turbo	64,000	92	Full image analysis	Partial tool-calling upgrade
Llama 4 Deep Think	128,000	60	Full image analysis, audio transcription	Chain-of-thought trace with tagging
Llama 4 Ultra	256,000	—	Video + code fusion (private alpha)	Not yet released for public use

These upgrades allow more complex projects to be handled in a single conversation, reducing the need for manual summarisation or splitting inputs into multiple sessions.

Availability varies by plan and location.

Plan	Default model	Highest model opt-in	Access notes
Free	Llama 4 Turbo	Deep Think (10 calls/day)	Global, no cost changes
Plus	Llama 4 Turbo	Deep Think GA	Included in subscription
Meta AI+	Llama 4 Turbo	Deep Think beta	15% token surcharge for beta use
Enterprise	Custom mix	Deep Think or Ultra	Per-organisation pricing and governance controls

Deep Think remains opt-in for most plans, with Ultra restricted to a small number of enterprise partners.

Governance tools accompany each rollout.

Control	Purpose
Model allow-list	Restricts available versions for a workspace.
Spend caps	Separate token and call limits by model tier.
Audit tagging	Flags preview calls with model stage metadata.
Data residency	Ensures processing stays in EU, US, or APAC clusters.
No-train setting	Prevents conversation data from being stored or reused.

These safeguards enable organisations to adopt new models without losing oversight of usage or compliance.

Performance benchmarks show tangible improvements.

Task	Llama 4 Turbo	Llama 4 Deep Think
PDF Q&A with 10,000 tokens and images	1.9 s first token, 420 ms retrieval	2.6 s first token, 390 ms retrieval
Multi-image scene analysis	5.3 s total	5.0 s total
Audio-to-summary for 2-minute clip	—	11.8 s total

The Deep Think model trades slightly slower first-token latency for longer memory and reasoning depth, making it more effective for extended projects.

Known issues are tracked with workarounds.

Issue	Affected model	Suggested workaround
Memory truncation at 110,000 tokens	Deep Think	Split session and send a summarised recap
Evening latency spikes in EU	Turbo (Free tier)	Schedule generation during off-peak hours
Occasional JSON mismatch in tools	Turbo beta tools	Re-register schema or simplify parameters

Meta publishes these limitations with recommended mitigation steps in the model release notes.

The roadmap points toward even larger context and customisation.

Planned upgrades include a fine-tuning toolkit for Llama 4 Turbo, enabling LoRA adaptation for up to 25 million tokens; on-device inference for Ray-Ban smart glasses using quantised Llama 3.5; and a code execution sandbox inside chats for Deep Think. These features are expected to extend both the autonomy and portability of the models in production environments.

____________

DATA STUDIOS

datastudios.org