Amazon Web Services is doubling down on enterprise-ready AI agents with new capabilities for its Amazon Bedrock AgentCore platform that will help make agents easier to govern, test, and personalize. The update brings natural-language policy controls, built-in evaluation suites, and long-lived memory — all aimed at helping organizations scale from proof of concept to production with fewer surprises.
What AWS Announced for Enterprise AI Agents
The marquee feature is Policy in AgentCore, which allows teams to specify agent behavior and compliance limits in plain language. Those policies plug into the root AgentCore Gateway (the layer that facilitates an agent’s access to external tools and data) and inspect every action before it runs. Consider it an allow/deny for use of the tool and access to data, written more like a business rule than like software specification.

AgentCore Evaluations presents 13 prebuilt tests for scoring agents in terms of correctness, safety, tool-selection accuracy, and other dimensions. Rather than building one-off harnesses, developers can use common evaluation patterns and extend them as needed, injecting a more MLOps-like discipline into agent rollouts. For companies that need to show due diligence — security reviews, internal audit, or risk-model review — a packaged eval can speed up review cycles.
Completing the release, AgentCore Memory enables agents to persist granular user-level context over time (e.g., travel preferences or historical tickets) and leverage that information to enhance future interactions. One missing piece for lots of agent deployments has been memory: without memory, every session is a cold start; with memory, teams have to manage privacy, policy, and lifecycle explicitly. AWS is basically productizing that tradeoff in the agent platform.
Why These AWS AgentCore Updates Matter for Businesses
The largest blockers to enterprise agents are not model quality itself, but governance, reliability, and integration risk. Tool-use mistakes that cause them to push the wrong system, ask for an injection of data to exfiltrate, or “over-achieving” agents that do more than their remit are the activities which arrest programs. By pushing policy checks into the gateway layer and shipping standard evals, AWS appears to be aiming at the operational, pragmatic considerations as opposed to model scores only.
Market context points to increasing demand. GenAI adoption was about 72% among respondents to McKinsey’s 2024 State of AI report, with approximately 60% deploying it in at least one business function. Gartner predicts that through 2026, more than 80% of enterprises will use GenAI APIs or generate applications with GenAI capabilities in production, up from a low single digit today. As adoption broadens, there will be pressure to standardize guardrails, testing, and the auditability of these tools.
Memory may also act as a force multiplier for productivity. Having a service agent that can recall previous product entitlements and earlier troubleshooting actions, preferred channels, etc., would shorten handle time in customer operations and bring down escalation rates. In knowledge work, a purchasing assistant who remembers vendor limits and policy restrictions can stop nonconforming orders before they get into the queue of an approver.
How the New AWS AgentCore Tools Compare to Rivals
Rivals have increasingly been homing in on those same needs. Google’s Vertex AI Agent Builder emphasizes grounding, evaluation tooling, and safety systems; Microsoft’s Copilot Studio layers governance and connectors across the Microsoft 365 surface; OpenAI’s assistants and GPTs offer actions and memory scoped to tasks. The angle of AWS is to make policy checks at the tool gateway and co-locate its evaluations with the Bedrock model catalog, making AgentCore the multi-model, multi-tool agent control plane in the AWS stack.

That integration story matters to customers who are already standardized on AWS identity, networking, and observability. It takes the glue code that you would normally need to write between agents and your enterprise policy, logging, and cost controls, and makes it very easy. What to look for is how well AgentCore's policies can be fully aligned with existing governance primitives (e.g., role-based access and data classification) without resorting to bespoke mappings.
Early Use Cases for AgentCore and Potential Dangers
Early adopters are likely to be focused on bounded, high-volume workflows where memory and policy get clear wins:
- Travel booking with stored preferences and spending caps
- IT service desks that triage and resolve, but only escalate above certain thresholds set by policy
- FinOps agents who do invoicing but can’t release payments of a size or frequency outside preset limits
Memory raises governance stakes. Companies will need transparent consent flows, retention windows, and deletion workflows around user-level context — not least under regimes like GDPR and CCPA. AgentCore Memory and policy should map to frameworks like the NIST AI Risk Management Framework and ISO/IEC 23894, which will help articulate technical controls into auditable practices. You can expect security teams to ask for strippable eval reports appended to every agent release, not dissimilar from change-management evidence for old-school software.
What to Watch Next as AWS Rolls Out AgentCore
The big questions that determine the uptake will be:
- How easily teams can extend 13 evals to context-specific KPIs
- How granular policy controls can be at the tool and data layer
- What the operational overhead of MM at scale looks like
The breadth of integration (connectors to leading SaaS, on-prem data) will also determine if AgentCore becomes a first-class agent runtime or just one more targeting a heterogeneous stack next to frameworks like LangChain or LlamaIndex.
But the trend for now is clear: AWS moves agent safety, reliability, and personalization out of bespoke code and into first-class platform features. If those controls turn out to be tight but agile when put into real-world use, they could help accelerate the transition from experimental chatbots to production agents that enterprises are willing to rely on.