Meta Wants to Invest $15 Billion in Scale AI to Secure Strategic Control Over High-Quality AI Training Data
- Graziano Stefanelli
- Jun 12
- 4 min read

Meta plans to invest $15 billion in Scale AI, acquiring a 49% non-voting stake. The company is moving to secure strategic access to high-quality, human-annotated data, which is increasingly central to building advanced AI.
With this investment, Meta intends to reduce reliance on external data providers and strengthen its position as other tech giants compete for the same resources.
Scale AI, led by Alexandr Wang, has built a reputation as a top provider of large-scale data labeling and infrastructure. Bringing Wang into senior leadership at Meta would add new expertise and help drive internal research on next-generation models.
The deal is still pending approval, but it is already drawing the attention of regulators and major players across the industry.
1. A turnaround-size problem inside Menlo Park
For all the noise Meta made by open-sourcing Llama-2 in 2023, the company’s generative-AI momentum sputtered this spring when its flagship Llama 4 family debuted and promptly under-performed rival models from DeepSeek and OpenAI. Internal dashboards showed research output slowing and top talent leaving for Anthropic, Mistral and Safe Superintelligence; third-party trackers estimate Meta lost 4.3 percent of its senior AI staff in 2024 alone. Mark Zuckerberg responded with an aggressive “reset” directive: secure privileged access to the one input every foundation model still needs—vast quantities of impeccably labeled data—and bring in fresh leadership to restore morale.
2. Anatomy of the proposed deal
According to multiple reports first surfaced by Bloomberg and confirmed by Reuters, Meta intends to buy a 49 percent non-voting stake in San-Francisco–based Scale AI for roughly $14.8 billion to $15 billion. Scale’s 28-year-old CEO Alexandr Wang would remain in charge of the startup while simultaneously taking a senior post inside Meta to launch a new “superintelligence” research group reporting directly to Zuckerberg and chief scientist Yann LeCun. Recruiting packages north of seven figures are already circulating to lure DeepMind and OpenAI veterans into that unit.
3. Scale AI in brief—eight years from dorm-room code to data linchpin
Origins and early funding. Scale began life in 2016 as Scale API inside Y Combinator, founded by MIT dropout Wang and former Quora engineer Lucy Guo. Accel seeded the company with a $4.5 million Series A; Index Ventures led an $18 million Series B a year later.
Product pivot. The original “Mechanical Turk-with-an-API” quickly morphed into a full-stack data-labeling service for self-driving–car datasets, then broadened to a horizontal Data Engine that now supplies training, RLHF and red-teaming data to almost every frontier-model lab.
Capital scale-up. A 2024 Amazon-led round put a $13.8 billion valuation on the company; by March 2025 insiders were marketing a tender offer at up to $25 billion.
Today’s footprint. Scale employs roughly 900 staff across engineering, research and sales while coordinating tens of thousands of gig-based annotators through its Remotasks and Outlier subsidiaries.
4. What exactly does Scale sell?
Scale’s revenue—$870 million in 2024 and projected to exceed $2 billion in 2025—comes from three intertwined offerings.
Core Data Labeling. Human-in-the-loop annotation pipelines for images, text, 3-D point clouds and code. The company claims “single-digit error rates” on safety-critical datasets for automotive, defence and healthcare clients.
Synthetic-Data & RLHF Studio. A managed service that blends model-generated examples with expert oversight to cheapen long-tail data creation, plus reinforcement-learning feedback loops for preference tuning.
SEAL Evaluation Lab. A private, adversarial-testing framework (“Humanity’s Last Exam”) used by OpenAI, Meta and the U.S. Department of Defense to red-team new models before launch.
5. The contentious side of a data empire
Scale’s gig-economy backbone has drawn repeated scrutiny. A 2023 class action in San Francisco accused the firm of wage theft and arbitrary task cancellations for Filipino workers earning pennies per label. While the U.S. Department of Labor closed a broader FLSA investigation in May 2025 without enforcement, labor-rights NGOs continue to flag “digital sweatshop” conditions in several countries.
6. Why Meta needs Scale now
Meta trains its foundation models on about 1.6 trillion tokens per generation; internal planning documents suggest that number must triple to stay competitive with GPT-5. Yet half its existing corpus is synthetic or scraped and carries uncertain licensing status. By tucking Scale’s premium annotation queue inside the corporate perimeter, Meta gains:
Priority access to scarce, high-fidelity human labels that rival labs may suddenly struggle to buy.
Integrated feedback loops—Scale’s evaluation data can be piped straight back into Meta’s model-safety team, shortening iteration cycles.
A charismatic operator. Wang’s growth-at-all-costs style complements Zuckerberg’s appetite for audacious, multi-year bets.
7. Risks and open questions
Regulators can still challenge minority deals that entrench gatekeepers; the FTC has signalled it is reviewing this one under Clayton Act Section 7. Competitors like Turing and Surge AI are already courting Scale customers unsettled by the Meta tie-up. And inside Meta, veteran researchers privately worry about placing a data-operations founder—rather than a publishing academic—at the helm of a “superintelligence” lab. Whether the hybrid leadership model works may hinge on how much autonomy Wang actually retains.
8. The industry ripple effect
If the agreement closes, two outcomes are almost certain. First, annotation prices will rise as AI labs scramble to replace a suddenly “vertically integrated” supplier. Second, more cloud-platform incumbents—Microsoft, Google, Amazon—will feel pressure to lock down their own data pipelines, either by acquiring specialist firms or by scaling synthetic-data research to offset the shortage.
Although final signatures are still pending, Meta’s pending stake in Scale AI reads less like a side bet than a strategic hinge: a $15 billion wager that proprietary, rigorously curated data—and the people who can marshal it—remain the decisive currency of frontier AI. If that thesis is right, Zuckerberg may have bought himself the raw material for a comeback. If it’s wrong, Meta will own half of a very expensive data factory that its rivals no longer wish to visit.
________
FOLLOW US FOR MORE.
DATA STUDIOS




