Milestone has raised $10 million in seed funding to address the question every engineering leader is being asked from the boardroom: is generative AI really paying off?

The Israeli startup is creating a measurement layer that connects usage of code assistants (and other AI tools) right to delivery outcomes, transforming AI hype into defensible ROI.

Table of Contents

What Milestone actually measures across teams and tools
Why it’s hard to prove real AI ROI inside engineering orgs
Enterprise DNA and the ecosystem bet behind Milestone’s play
How enterprise buyers can make AI measurement actually work
The bottom line on measuring AI impact with engineering data

The round also features strategic investment from Atlassian Ventures, and an A-list of operators and investors that includes GitHub cofounder Tom Preston-Werner, former AT&T chief John Donovan, Accenture technology advisor Paul Daugherty, and ex-Datadog president Amit Agrawal.

What Milestone actually measures across teams and tools

Milestone’s product sucks in signals across four areas (codebases, project management systems, team structure, and the AI tools themselves) to create what the company calls a genAI data lake. The point is correlation, not vanity metrics: which teams are using which AI assistants to do what, and with what impact on cycle time, code quality, and incident trends.

Rather than counting prompts or generated lines, Milestone instead aligns usage with well-known engineering benchmarks like deployment frequency, lead time for changes, change failure rate, and mean time to recovery — the “DORA” metrics advanced in the Accelerate research. It can signal whether a surge in regressions came from AI-generated code paths, or if your new assistant shortened time to deliver a feature without increasing escaped defects.

That level of attribution becomes even more critical as AI tooling scales from autocomplete to chat to agentic workflows. By triaging the “where” and “how” of impact, the platform offers leaders proof to scale a tool across teams — or rein it in.

Why it’s hard to prove real AI ROI inside engineering orgs

Developer AI is everywhere — GitHub says it has tens of millions of Copilot users — but it’s not often adopted with governance, let alone measurement. CFOs need ROI in months; engineering leaders have to deal with tool sprawl, shadow usage, and uneven team maturity. Global consultancy surveys regularly show that organizations are adopting AI faster than they are establishing what success looks like, which begs the question of whether a productivity boom is truly warranted.

Milestone’s bet is that the right denominator for AI ROI isn’t time saved typing but business-relevant throughput and reliability. In practice, that means connecting assistant use to story points delivered, pull request review time, size, rework rates, and post-release defects — measurements that stand up to finance inspection and play well with platform teams.

Milestone raises $10M to prove measurable AI engineering ROI

Enterprise DNA and the ecosystem bet behind Milestone’s play

The startup is doubling down on businesses and cutting smaller customers off to focus on roadmap items for which only big companies clamor. With Atlassian Ventures on its cap table, and partners including GitHub, Augment Code, Qodo, Continue, and Atlassian’s Jira (among many others), Milestone is presenting itself as the neutral analytics layer across a fragmented AI tooling stack.

“Studyclix had the most improbable-sounding origins,” says cofounder Elidan (CTO Professor Stephen Barrett, who is a computer science professor at Trinity College Dublin). “It was formed between two people who never met in person and worked remotely together for years on some part-time hobby project turned fully fledged business.” That bridge between academia and industry is evident in the product, which focuses on sound experimental design for pilots and rollouts.

How enterprise buyers can make AI measurement actually work

Early enterprise users are also using Milestone to conduct pilots before an enterprise-wide launch. A typical approach: have A/B teams on different assistants, standardize guardrails, then compare the statistical significance of cycle time, review throughput, and post-merge defect rates. If those use cases surpass a certain ROI minimum, then procurement scales licenses; if they don’t, the company reallocates budgets to higher-yield use cases.

Another pattern is risk containment. By following defects to their source, leaders can tune where AI is allowed (e.g., test generation, boilerplate code) and where it is not (e.g., security-critical modules). The data also can be used to right-size licenses, matching premium seats to workflows that clearly benefit.

Compliance and change management also make this instrumentation matter. With an auditable record of how AI contributes to releases and incidents, enterprises can satisfy internal governance standards while giving teams feedback loops that aren’t based on anecdote.

The bottom line on measuring AI impact with engineering data

The market is no longer rewarding you simply because you are adopting AI for its own sake. It rewards measurable improvement. By joining usage telemetry with the metrics engineering leaders are already tracking, Milestone is attempting to make AI spend defensible to finance and actionable for platform teams. If the company can keep up with the rapid move toward agentic dev tools, its pitch is simple: less arguing about what AI is worth, more evidence in production.