xAI has purportedly cut around 500 jobs from its data annotation outfit in a substantial move to change how the company trains its chatbot, Grok. Internal messages reviewed by Business Insider characterize the pivot away from “general AI tutor” work, and toward a strategy that revolves around domain specialists.
The reduction is reported to impact approximately a third of a 1,500-person annotation team. In tandem, xAI indicated that it plans to increase the number of its specialist tutors across areas from STEM, finance and medicine through to safety — telling potential candidates on X that it’s planning to grow those ranks “by 10x”.

From a Generalist Rater to Domain Expert
Annotation teams take on the painstaking task of labeling, scoring, and ranking model outputs—tasks that are instrumental for supervised finetuning and reinforcement learning from human feedback. A “general AI tutor,” as a rule, will judge on a loose definition of conversational quality… whereas the specialist is applying domain knowledge and actually grading steps in reasoning, checking numbers etc. or citations in delicate fields.
Turning to experts is a broader industry shift: when top foundation models near the functional baseline, companies distinguish themselves by pushing specific kinds of reasoning and safety guardrails. Researchers and practitioners have repeatedly observed a disproportionate benefit of expert-curated datasets in high-complexity domains that can reduce hallucinations, while increasing the factual robustness of decisions made — such as in clinical guidance or financial analysis.
The trade-off is cost and availability. Specialist raters are less abundant and cost more than ordinary crowd workers. Companies like Scale AI, Surge AI, Appen, TELUS International AI Data Solutions and Sama have built networks to find both generalists and domain experts, but it can be difficult to ramp expert work up quickly.
What It Means for Grok’s Training Pipeline
Grok’s future performance improvements will depend more and more on high-quality, expert-verified data rather than sheer volume. While still not as common, the trend of increasing dependence on elaborate feedback mechanisms and synthetic data generation to reinforce human annotation (especially when web-scale corpora get saturated or noisy) has been documented by The Stanford AI Index.
xAI focuses on specialist tutors, and we speculate that it may automate more of the “general” feedback loop through model-assisted harvesting while focusing human efforts on thornier tasks. This reflects methods in practice at top labs, who apply auto-raters for first-span filtering and then use professionals to adjudicate hard reasoning, safety edge cases, and tool-use reliability.
The silver lining may be more focused performance in select areas and clearer lines of safety. The danger is if generalist review contracts too much, coverage of gaps and backsliding in everyday discourse can sneak back in Maintaining high overall conversational quality while cranking up expert rigor is a tightrope act.
A Jolt to the Data Labor Force
Even today, data annotation is far and away AI’s most labor-intensive line item. Market researchers have estimated that the data labeling market is worth several billion dollars a year, with compensation for full-time workers and large pools of contractors. Cuts of this magnitude at a single AI developer underline how rapidly the mix of expertise in this workforce is already altering.
Worker advocates like groups convened by Partnership on AI have cautioned that wild demand swings, from general raters to specialists, risk upending income for annotators and eroding institutional knowledge. There are clear transition pathways, fair severance and upskilling plans that are often referred to as the best practice in this area – but has uneven uptake across the industry.
Open Questions Around the “10x” Plan
xAI has not defined the starting values from which it will generalize expert tutors. A tenfold expansion off of a fairly small nucleus could still allow for a relatively lean team, especially if many positions continue to be contracted. Crucial details — such as whether experts will be in-house or with vendors, how quality will be audited and how safety reviewers will be trained — will dictate whether the pivot provides the promised gains.
Regulatory pressure also looms. The NIST AI Risk Management Framework and these new global rules prioritize data governance, documentation, and rigor of the evaluation. Specialist labeling can also help fulfill those expectations, but only if it is with transparent process, measurement and red-teaming in the domains where error has an outsized impact.
For now, it is becoming abundantly clear that xAI is rewriting its human feedback engine in favor of the depth over breadth button. If well executed, Grok may win expert workflows and gain street cred; mismanaged, it could lose the generalist sheen that keeps most users engaged.RELATED:74 Ways The Internet Has Changed In The Last 20 Years.
The internal messages detailing the cuts and strategic retreat were reported earlier by Business Insider. xAI mentioned its hiring campaign in a public post on X In / The company declined to comment further on the measures available for those losing their job and retraining opportunities, nor the number that will make up its 2020 specialist cohort.