Anthropic names former Stripe CTO Rahul Patil its new chief technology officer, and Sam McCandlish transitions to chief architect. The shake-up moves compute, infrastructure, and inference directly under Patil’s purview and assigns McCandlish to pretraining and large-scale model training. Both will report to president Daniela Amodei, suggesting aggressive efforts to scale Claude while lowering latency, costs, and energy consumption.
The company is also restructuring its core technical organization, winding product engineering more closely into infrastructure and inference. The tighter loop is intended to speed up the roadmap and make real-world use reliable for enterprise deployments where strong throughput and predictable performance can matter as much as raw model quality.

Why Anthropic Is Rebuilding Its Stack for Efficiency and Scale
Anthropic is facing a competitive infrastructure arms race already in full throttle. Meta leadership has signaled plans for a gargantuan multiyear buildout of U.S. compute, and OpenAI has secured massive capacity through Oracle’s “Stargate” effort. Anthropic’s own capex is less in the open, but the trend lines are clear: more GPUs, more power, and driving down cost per token through ruthless efficiencies.
It’s not only competitive urgency, though. The International Energy Agency has warned that AI could guzzle one-tenth of the world’s electricity by 2030, and data centers used to train AI are already spreading in places like Iowa and Virginia. Uptime Institute’s latest polls still place the average data center PUE somewhere near the mid-1.5s, which stresses why efficiency gains at the software stack—compilers, kernels, serving frameworks—are just as important as new generations of hardware.
Inference is now the biggest ongoing cost for model providers. Analyses from industry researchers and investment firms have consistently shown that the bulk of lifecycle compute spending accrues to serving, not training. This is especially true with enterprise SLAs that can’t handle queuing when usage picks up. If you don’t like the trade-offs, this reality will incent scheduling and batching (and memory efficiency) to get more tokens from the same cluster.
What Patil Brings to the Job from Hyperscale Operations
Patil has two decades of experience working with large-scale systems. Beyond his stint as CTO of Stripe, he has also been senior vice president for Oracle’s cloud infrastructure and held senior engineering roles at Amazon and Microsoft. That mix of hyperscale operations with payments-grade reliability and the rigor of enterprise customers is directly relevant to Anthropic’s next stage.
Amodei has made special mention of Patil’s experience building reliable platforms for businesses — precisely the kind of capability customers will be looking for as they incorporate Claude into workflows, from generating code to retrieving knowledge.

For his part, Patil had only kind words for Anthropic’s research direction and treatment of safety, framing the CTO seat as an opportunity to harden systems while maintaining alignment and security top of mind.
Immediate Priorities for Claude’s Performance and Costs
There’s already been demand that has exposed pressure points. Anthropic recently implemented rate policies for high-use customers of Claude Code in an effort to stop always-on background use that had saturated shared capacity. Usage windows for tiers like Sonnet and Opus were capped to keep performance at peak levels — an operating indicator that optimization and capacity planning remain a high priority.
Look for Patil’s immediate focus to be around inference economics — maximizing GPU utilization, smarter batching across heterogeneous traffic, model serving techniques (e.g., speculative decoding), KV-cache reuse, and quantization up to a smart accuracy level. The way in which CUDA/Triton kernels, compiler pipelines, and paging strategies work together can result in step-function improvements without ever touching the model weights, as recent ML performance benchmarks have demonstrated that end-to-end software stacks can be tuned.
On the training side, McCandlish’s emphasis on pretraining and large-scale runs should coincide with infrastructure improvements — better data pipelines, checkpointing operations that shrink stalls, elastic cluster management that shrinks idle cycles. The aim: to reduce the lag time between a research insight and a production model, while still catering to enterprises that are already running Claude in production.
What’s At Stake Strategically For Enterprise AI
Enterprise adoption is all about trust: uptime, predictable latency, predictable cost. And for buyers comparing providers while GPUs remain hard to come by and as energy constraints rise, a CTO who lives at the intersection of capacity planning and developer experience can make all the difference. High throughput due to reliability, graceful degradation under load, and observability have also started becoming almost as important as benchmark scores.
Anthropic’s reconfiguration of leadership represents clear owners on both sides of the AI value chain: optimal research velocity and infrastructure resilience. If Patil can drive down cost per token and McCandlish speeds up progress with training, Anthropic builds at a discount rather than buying its way to scale. In a market in which winners are combining frontier research with industrial-grade operations, that balance may be the most durable moat.
