Google’s lead on developer tools, Ryan Salva, not only is shipping AI features — he’s reimagining how code gets made. With Gemini CLI and Gemini Code Assist, his team is operationalizing “agentic” programming: systems that don’t just propose snippets but work plans, call tools, run tests, open pull requests, and iterate until something actually ships. It’s a ground-level look at AI coding that begins at the terminal and culminates in a merged PR.
From code autocomplete to autonomous AI agents
AI has gone from autocompletion to autoexecution. The new minimum is tool-calling: it’s when your models grep a repository, compile a target, run unit and integration tests, and fix their own mistakes. That closed loop transforms large language models from suggestion engines into doers. Salva’s team has embraced this with Gemini’s built-in actions that chain plans, run commands, and verify outcomes — not stop at “looks right.”

The move is a result of reliability. GitHub’s controlled experiment found that developers completed tasks 55% faster with an AI pair-programmer, but speed is only a feature if the code passes CI. For “working code,” however, you need a planner, an executor, and a verifier. So Gemini agents stress deterministic steps — compiling, testing, and linting before you write a PR — so that suggestions are always tethered to actual feedback, not vibes.
A requirements-first workflow for AI-assisted coding
Salva’s own workflow begins where many teams stumble: vague, under-specified tickets. He transforms a fuzzy problem into a concrete Markdown spec — acceptance criteria, test plan, dependencies, and edge cases — using Gemini CLI. The model then produces code against those requirements and the team’s documented conventions. Every successful step can be a commit and an open pull request, thus generating a reversible audit log.
This “docs-as-code” paradigm turns IDE time on its head. The terminal — with the keyboard — is the place where planning and execution happen, while for many the IDE just becomes a place to read (not write). It also makes code reviews cleaner, where a reviewer can verify clear intent, a patch, and the test result (rather than parse through a big splodge of speculative edits).
Why context is the secret sauce for reliable agents
Good agents don’t guess tokens; they respect the house rules. Gemini Code Assist consumes repo context — test playbooks in Markdown for what to test, dependency policies, security requirements — and uses retrieval to pull the correct snippets into the prompt window. That cuts down on hallucinations and better aligns suggestions with the way your team actually ships software.
Security and governance are built in. Least-privilege tokens, ephemeral sandboxes, and policy checks (from licensing constraints to SBOM-aware dependency guards) ensure that the automated changes themselves don’t create new risks. In practice, Gemini’s behavior can be parameterized: run tests here, patch only these paths, ask a human to review before touching infra code. The effect is speed without giving up control.
It is “working code” not keystrokes that count
Salva’s crew treats evaluation as a product feature. Beyond acceptance rates, they monitor build passes on first try and test pass ratios, reversion rate, and time to a valid PR — all align with the DORA and SPACE frameworks. They keep “golden tasks” that reflect actual workflows, such as fixing a flaky test, upgrading a dependency without breaking ABI, or refactoring a service with observability.

Independent research backed up the upside when guardrails are in place. McKinsey predicts that generative AI can boost developer productivity by 20–45% on some tasks, but the gains are focused in cases where teams invest in analysis harnesses, policy, and context. The moment of truth, however, is moving beyond tests like SWE-bench to production telemetry — did the PR ship and stay green? — and there it’s a matter of whether the assistant is paying for its seat.
Latency, cost, and optimization of control loops
Agentic coding isn’t free. There is latency added with every tool call, and tokens cost money. Google’s tactics are inspired by distributed systems: cache intermediate results, batch tool invocations, diff rather than rewrite entire files, pin model versions and prompts to help reproduce findings. Smaller local models can deal with linting and style, while larger reasoning models serve to recover from design changes or take on complex refactors — optimally scaling the compute to the job.
The ecosystem matters, too. Salva works across a heterogeneous stack — VS Code, Zed, Cursor, Windsurf — because you never live in one tool. The CLI is the lingua franca that ties it all together, supported by CI/CD and policy. It is that portability which prevents teams from having their process tied to a single editor plug-in.
What the developer’s role becomes in an AI workflow
It becomes about architecture, decomposition, and verification — not just a few keystrokes — and you still get clear code.
You will spend more time writing tight requirements, reading diffs, designing tests, and enforcing observability versus hand-typing boilerplate.
Code doesn’t vanish; it gets mediated — generated by an assistant, vetted by CI, and approved by humans who comprehend the system’s intention and limitations.
That doesn’t devalue software engineering — it sets a new bar. The reward is speed with trace: less context-switching, better commits, and changes that survive in production. That, essentially, is what Salva’s roadmap optimizes for. AI coding “works” when that loop closes, the tests pass, the PR merges, and teams are able to move on to new hard problems.
