OpenAI’s decision to take on a Pentagon contract after Anthropic walked away has triggered a blunt realization across Washington and Silicon Valley: there is no coherent plan for how frontier AI firms should work with the U.S. government, especially on defense. The result is a policy vacuum where ethical red lines, procurement rules, and workforce norms are being improvised in public — and everyone looks unprepared.
Why There Is No Playbook for Government AI Partnerships
General-purpose AI has outgrown the tech industry’s social-media era posture of big promises, soft guardrails, and regulatory courtship. When models become infrastructure for intelligence analysis, logistics, and targeting support, the stakes look more like aerospace than apps. Yet neither side has translated that reality into predictable processes. Tech leaders default to “leave it to elected officials,” while officials lean on ad hoc pressure instead of codified standards, leaving companies to guess what is acceptable from one week to the next.
- Why There Is No Playbook for Government AI Partnerships
- The Anthropic flashpoint over Pentagon contract limits
- OpenAI’s tightrope and workforce risk in defense work
- What the government already has in AI risk governance
- Procurement is the bottleneck for frontier AI contracts
- A practical path forward for safe, contractual AI adoption
The Anthropic flashpoint over Pentagon contract limits
The most visible rupture centers on Anthropic, which sought contractual limits on surveillance and weaponization before exiting a Defense Department deal. The subsequent threat by the Defense Department’s leadership to label the company a supply-chain risk — a move that could cut it off from chips and cloud hosting — is extraordinary. Former administration officials warn that such a designation would function as a corporate death sentence, even if later overturned, and would chill every vendor negotiating safety clauses with the government.
More fundamentally, shifting terms mid-contract undermines the trust that complex programs require. In the private sector, unilaterally rewriting scope gets you sued. In national security, it scares off precisely the advanced suppliers the government says it needs. The message heard across boardrooms is simple: your values may be negotiable, but the government’s demands are not.
OpenAI’s tightrope and workforce risk in defense work
OpenAI now inherits a delicate balancing act. Users and employees expect enforceable red lines on lethal autonomy and mass surveillance; political actors expect full-spectrum alignment. History suggests the internal pressure is real: Google’s Project Maven revolt in 2018 forced the company to step back from certain defense work, while firms like Palantir and Anduril embraced the mission and the politics that come with it. The difference today is that foundation model providers straddle consumer and defense markets at once, compounding reputational and retention risks.
What the government already has in AI risk governance
To be fair, there are building blocks. The National Institute of Standards and Technology released the AI Risk Management Framework, now widely cited but voluntary. The Defense Department’s Responsible AI Tenets and its updated directive on autonomy in weapon systems require human judgment and testing before deployment. The White House issued a sweeping AI executive order, and the Office of Management and Budget directed agencies to appoint Chief AI Officers, inventory AI use, and publish safeguards for safety-impacting systems.
Yet none of this answers the central contracting questions for frontier models: what testing is required before a model touches targeting workflows, how fast can weights or prompts change after accreditation, where are logs stored, who can audit them, and what happens when a vendor refuses a use case on ethical grounds? The frameworks exist; the binding terms do not.
Procurement is the bottleneck for frontier AI contracts
Federal acquisition rules excel at buying aircraft carriers, not APIs that update weekly. Foundation models blur lines between product and service, between unclassified outputs and classified inferences. Export controls complicate model weight access, while continuous deployment collides with static authority-to-operate regimes. Without standardized clauses, every negotiation becomes a culture war wrapped in a statement of work.
A practical path forward for safe, contractual AI adoption
First, establish a national template for “safety-critical AI” contracts. Map technical requirements to the NIST framework and the Defense Department’s autonomy directive: pre-deployment evaluation on mission-relevant benchmarks, rigorous red-teaming, incident reporting within set timeframes, immutable logging, and mandatory human-in-the-loop for designated functions.
Second, create independent evaluation capacity. A joint NIST–Defense testing center could accredit evaluation suites the way FedRAMP standardizes cloud security or FIPS 140-3 validates cryptography. Vendors would earn reusable attestations instead of negotiating bespoke proof each time.
Third, define bright lines and appeal rights. Prohibit fully autonomous targeting decisions and bulk, suspicionless surveillance by contract, echoing existing defense policy. Pair that with a structured “conscience clause” allowing vendors to decline specified uses without retaliation, and a change-of-law mechanism to reopen terms if policy shifts between administrations.
Fourth, adapt industrial security to models. Treat model weights like controlled technical data with tiered access, while using foreign-ownership mitigation tools that preserve corporate governance independence. This borrows from long-standing defense rules without turning AI labs into de facto government bureaus.
Finally, invest in talent and transparency. Allow limited-term secondments so researchers can serve in the Chief Digital and AI Office and return, create safe-harbor protections for employees who flag harmful deployments, and require public inventories of high-impact governmental AI uses with plain-language risk summaries.
None of this will end the politics. It will, however, replace improvisation with process, trade tweets for test plans, and give companies and agencies a shared map of where the red lines are. The alternative is what we have now: a scramble that endangers innovation, national security, and public trust at the same time.