Resolve AI, a startup building autonomous incident response for site reliability teams, has closed a $125 million Series A that sets the company’s valuation at $1 billion. The financing underscores accelerating investor appetite for AI systems that can prevent and remediate outages across modern, cloud-native infrastructure.
Why This Funding Matters For AI SRE and Reliability
The stakes for reliability have never been higher. The Uptime Institute has reported that more than half of significant outages now exceed $100,000 in total cost, while IBM’s latest Cost of a Data Breach study pegs the average breach at $4.45 million. With sprawling microservices and multi-cloud architectures raising operational complexity, enterprises are looking for tools that can shrink MTTR, cut alert noise, and automate the drudgery SREs call “toil.”
That backdrop explains the size of Resolve AI’s debut round. In a market where many AI companies pitch copilots that suggest fixes, buyers are pushing for closed-loop systems that can safely execute remediation. The pitch is simple: fewer pages to on-call engineers, faster rollback when a deployment goes sideways, and guardrails that keep compliance teams comfortable.
What Resolve AI Actually Does for Incident Response
Resolve AI positions itself as an “AI SRE” platform that sits atop existing observability and incident management stacks. It ingests telemetry from tools like Splunk, Datadog, Grafana, and cloud provider services, correlates signals with configuration and deployment data, and then proposes or executes runbooks via integrations with Kubernetes, Terraform, and ticketing systems.
Under the hood, the company blends retrieval-augmented generation with policy-aware agents. In practice, that means the system grounds an LLM on customer runbooks, past incidents, and topology graphs before taking action, and it enforces change windows, approvals, and rollback logic. Human-in-the-loop remains the default for high-risk steps, with execution paths captured for audit trails required by SOC 2 and ISO 27001 programs.
The need is pervasive. The Cloud Native Computing Foundation has reported near-ubiquitous Kubernetes adoption, which helps teams ship faster but also multiplies failure modes—from misconfigured service meshes to noisy autoscaling. By pairing pattern-matching across logs and traces with automation against cluster and cloud APIs, Resolve AI aims to turn tribal knowledge into repeatable, testable workflows.
Who Is Backing The Bet on Autonomous Remediation
The round is led by Lightspeed Venture Partners, joined by existing backers including Greylock Partners, Unusual Ventures, Artisanal Ventures, and A*. Investors are betting on founders with deep observability DNA: Resolve AI was started in 2024 by former Splunk executives Spiros Xanthos and Mayank Agarwal. Their prior company, Omnition, focused on distributed tracing and was acquired by Splunk in 2019.
That history matters commercially. If Resolve AI can interoperate cleanly with the monitoring stacks companies already own—and demonstrate measurable reductions in false positives, ticket volume, and MTTR—it can land quickly without forcing rip-and-replace. Early go-to-market efforts are likely to target regulated industries and large cloud-native engineering teams that run 24×7 on-call rotations.
The Competitive Landscape in AI-Driven Remediation
AI-driven remediation is emerging as a distinct category. Sequoia-backed Traversal is also applying agents to outage detection and resolution. Incumbents are circling too: PagerDuty, Datadog, ServiceNow, Dynatrace, and hyperscalers have rolled out generative AI assistants to accelerate triage and automate runbooks. The differentiation battleground is shifting from chat-based guidance toward verifiable, low-latency actions with enterprise-grade guardrails.
To win, vendors must prove real-world reliability. Buyers will look for concrete metrics such as alert deduplication rates, safe-rollback success, change failure rates, and the share of incidents fully closed without human intervention. Referenceable integrations with OpenTelemetry, strong RBAC, and red-teaming against prompt injection and escalation-of-privilege pathways are becoming table stakes.
What To Watch Next for AI SRE and Automation
Key signals will include how quickly Resolve AI expands beyond runbook automation into proactive prevention—think drift detection, anomaly-driven canarying, and capacity forecasting—as well as whether it inks strategic partnerships with cloud providers or major observability platforms. Given consolidation trends, acquisition interest from monitoring or ITSM giants is plausible if the product demonstrates durable reductions in toil and on-call load.
For now, the message to SRE leaders is clear: autonomous remediation is moving from slideware to production. If Resolve AI converts this funding into dependable, auditable actions across complex stacks, the company will have earned more than a unicorn label—it will have changed the default response to outages from “wake someone up” to “let the system fix itself.”