Gemini Managed Agents — Half-Price Harness, Half-Built Stack
Google shipped Managed Agents in the Gemini API at I/O 2026 with aggressive pricing and real infrastructure. The orchestrator model ships next month.
The managed-agent harness race just became a price war — and Google fired first. On May 19 at I/O 2026, Google shipped Managed Agents in the Gemini API with Gemini 3.5 Flash at $1.50/$9.00 per million input/output tokens, roughly an order of magnitude cheaper than Anthropic’s Opus-powered managed agents. The catch: Gemini 3.5 Pro, the model Google designed as the orchestrator for multi-agent chains, doesn’t ship until June. You’re getting the harness and the worker, but the brain that ties production multi-agent deployments together is six weeks out.
TL;DR
- What: Google launched Managed Agents in the Gemini API — single-call provisioned sandboxes for agentic workloads, powered by Gemini 3.5 Flash
- Pricing: $1.50/$9.00 per 1M input/output tokens with 90% context-caching discount — an order of magnitude cheaper than Anthropic’s Opus-based managed agents
- Gap: Gemini 3.5 Pro (the intended orchestrator model) ships next month — right now Flash runs both orchestrator and worker roles
- Action: Evaluate pricing for long-running agent loops now; wait for 3.5 Pro before committing to multi-agent architectures
Managed Agents in the Gemini API — What Happened
I’ve been tracking the managed-harness race since Anthropic launched Claude Managed Agents in April, and Google’s entry is more credible than I expected. The infrastructure underneath isn’t marketing theater. The GKE Agent Sandbox — gVisor kernel isolation with sub-second provisioning — was shipping quietly since Google Cloud NEXT ‘26. What I/O did was expose the developer API on top of that primitive. When Google says “single API call,” they mean it: one request provisions an isolated Linux environment, spins up the Antigravity agent running Gemini 3.5 Flash, and gives it tool access and code execution. No container orchestration on your side, no sandbox management, no infra work. The product is available through the Interactions API and Google AI Studio, with enterprise support on the Gemini Enterprise Agent Platform in private preview.
The benchmark story matters because of what it signals about capability tiers. Gemini 3.5 Flash — a Flash-class model, not Pro-tier — scores 76.2% on Terminal-Bench 2.1 versus 70.3% for Gemini 3.1 Pro, and 83.6% on MCP Atlas versus 78.2% for 3.1 Pro. This is the first time a Flash-class model has outperformed the previous Pro generation on the agentic benchmarks that actually matter for managed-agent workloads. All numbers are self-reported by Google — every vendor self-reports, and these benchmarks align with the pattern of Flash models closing the capability gap we’ve seen across the industry.
Agent behavior is defined through versionable AGENTS.md and SKILL.md files — the same convention as Anthropic’s CLAUDE.md and the broader agents-file pattern emerging across the ecosystem. This is portability-friendly compared to AWS AgentCore’s proprietary configuration layer, though the sandbox and harness infrastructure remain firmly Google’s.
Why This Matters
The pricing math changes the calculus for long-running agent loops. Gemini 3.5 Flash ships at $1.50/$9.00 per million input/output tokens — roughly 40% cheaper than Gemini 3.1 Pro at $2.00/$12.00. But the real story is the 90% context-caching discount: cached input tokens drop to $0.15 per million. For agentic workloads where the agent re-reads the same context across dozens of tool-call rounds, this transforms the economics. Compare this against Claude Opus 4.7 powering Anthropic’s Managed Agents at roughly 10× the per-token cost, and Google is making a clear play: win the harness race on price, not just capability.
This matters because managed agent harnesses are converging on the same architectural pattern. Anthropic, Google, and AWS all now offer some version of “API call → isolated sandbox → agent with tools.” The differentiation is shifting from “can it do agentic work” to “what does it cost to run a 200-turn agent loop in production.” Google is positioning aggressively on that second question.
Standard API accounts hit 429 rate limits almost immediately on managed-agent workloads. Google AI Ultra ($100/month) is the real entry ticket for anything beyond toy experiments. Budget accordingly.
The agents-file convention is solidifying into an industry standard. Google adopting AGENTS.md and SKILL.md alongside Anthropic’s CLAUDE.md means we now have three major providers using Markdown-based agent configuration. This is good for portability — your agent behavior spec isn’t locked to a vendor’s proprietary format. The sandbox and runtime are still vendor-locked, but the behavior layer isn’t. That’s a meaningful distinction for teams hedging across providers.
But there is an architecture gap that teams will hit immediately. Gemini 3.5 Pro — designed as the orchestrator model for multi-agent chains where Flash subagents handle specialized tasks — is confirmed for June 2026 but isn’t available yet. Right now, you’re running Flash as both orchestrator and worker. This works for single-agent workloads, but it’s not the architecture Google designed for production multi-agent patterns. It burns more tokens on orchestration because Flash isn’t optimized for planning across subagent outputs, and anyone building multi-agent systems today will need to re-architect when Pro ships.
That’s a six-week window where the product isn’t whole. Google shipped the harness, shipped the worker model, but the orchestrator that ties multi-agent deployments together is still “coming next month.” If you’re evaluating managed harnesses for a Q3 production deployment, this timeline matters.
The
AGENTS.md+SKILL.mdpattern works independently of Google’s managed harness. Define your agent behavior in these files now — they’ll port to whichever provider you choose, and the convention is converging across Anthropic and Google.
The competitive landscape is clarifying fast. Anthropic Managed Agents offers the strongest model at the highest price. AWS AgentCore offers the deepest cloud integration with the most proprietary config layer. Google Managed Agents offers the most aggressive pricing with mature sandbox infrastructure — but ships without its orchestrator model. Each is making a different bet on what developers will optimize for: capability, ecosystem integration, or cost. My read is that cost wins for the majority of production workloads, which puts Google in a strong position once 3.5 Pro closes the gap.
The Take
Google just did something quietly important: they decoupled the sandbox infrastructure from the model layer. The GKE Agent Sandbox is battle-tested with sub-second provisioning. The model sitting on top — Flash today, Pro next month — is swappable. That architectural decision is smarter than it looks. Anthropic’s managed agents are tightly coupled to Opus. AWS AgentCore is tightly coupled to Bedrock. Google built the harness as a layer that sits underneath whichever Gemini model you pick, and priced the cheapest option aggressively enough to win on volume.
Here’s what I’d actually do: if you’re running single-agent workloads — code execution, research loops, tool-heavy tasks — evaluate Google Managed Agents now. The pricing with context caching is genuinely hard to beat, and Flash’s agentic benchmarks are strong enough for worker-tier tasks. If you’re building multi-agent orchestration, wait for 3.5 Pro in June. Don’t build your orchestration layer on Flash and then re-architect in six weeks. The harness is ready. The orchestrator isn’t. Plan accordingly.
The managed-harness race now has three serious entrants. The winner won’t be determined by benchmarks or features — it’ll be determined by which one developers actually ship production workloads on. Right now, Google’s pricing gives them the clearest path to volume.
Related
- Anthropic Managed Agents — Runtime Lock-In: Anthropic’s managed agent harness ships with Opus 4.7 — strongest model, highest price, tightest coupling
- AWS AgentCore — Managed Harness Lock-In: AWS enters the harness race with deep Bedrock integration and the most proprietary configuration layer
- Google Antigravity 2.0 — Platform Lock-In: The Antigravity agent powering Managed Agents and its evolution from internal tool to developer-facing product