[release] 5 min · Apr 9, 2026

Anthropic Advisor Tool — Pay Opus Only When Needed

Anthropic's Advisor Tool inverts multi-agent cost orthodoxy: Sonnet executes, Opus advises only on hard decisions. +2.7 SWE-bench points, −11.9% cost.

Anthropic Advisor Tool ↗ Apr 9, 2026

#anthropic#claude#api#agent-cost#multi-agent#agentic-workflows

Anthropic shipped the Advisor Tool in public beta on April 9, 2026. One HTTP header (anthropic-beta: advisor-tool-2026-03-01), one tool definition, and your Sonnet executor can escalate hard decisions to Opus mid-generation — without you building an orchestration layer. The number that matters: Sonnet with an Opus advisor scores 2.7 percentage points higher on SWE-bench Multilingual than Sonnet solo, while costing 11.9% less per agentic task. If you are running everything through Opus today because you cannot afford quality drops on the 5% of decisions that actually need deep reasoning, this changes the math.

TL;DR

What: Anthropic’s Advisor Tool lets a Sonnet/Haiku executor call Opus mid-generation only on hard decisions — one beta header, zero orchestration code
Numbers: +2.7 points SWE-bench Multilingual and −11.9% cost (Sonnet+Opus vs. Sonnet solo); Haiku+Opus doubled BrowseComp score at 85% lower cost than Sonnet solo
Trap: Advisor does not stream, Priority Tier does not cascade, advisor tokens are billed separately at Opus rates with no max_tokens cap
Action: Benchmark against your existing agentic pipelines before committing — integration cost is near-zero, savings on high-volume workflows are real

What Happened — The Inverted Orchestration Model

Every multi-agent architecture I have seen in production follows the same pattern: a large orchestrator model sits at the top, delegates subtasks to cheaper workers, and you pay orchestrator-tier pricing on every turn regardless of whether the decision was trivial. The Advisor Tool inverts this entirely. Your cheap model — Sonnet or Haiku — runs the full agentic loop. It handles tool calls, file edits, code generation, all of it. Only when it hits a decision it cannot resolve does it escalate to Opus.

Implementation is absurdly simple. Add anthropic-beta: advisor-tool-2026-03-01 to your API request headers. Add one tool definition to your tools array:

{
  "type": "advisor_20260301",
  "name": "advisor",
  "model": "claude-opus-4-6"
}

That is the entire integration. The executor decides when to invoke the advisor. The sub-inference happens server-side within the same /v1/messages request — no separate API call management, no routing logic, no orchestration layer you need to maintain. Opus generates typically 400 to 700 tokens per consultation while the executor handles all heavy output at Sonnet or Haiku rates.

The beta launched with claude-opus-4-6 as the only supported advisor model. Opus 4.7 shipped a week later on April 16 — check current API docs before hardcoding model strings in production, as advisor model support may have expanded since the initial launch.

Why This Matters

The standard multi-agent cost compromise has always been a false binary: run Opus on everything and accept the bill, or run Sonnet and accept quality drops when reasoning actually matters. Most agentic workloads sit somewhere ugly in between — 85% of turns are mundane tool calls where Sonnet is more than sufficient, but the remaining 15% contain architecture decisions, complex debugging, or multi-step planning where the quality gap between Sonnet and Opus is measurable.

The Advisor Tool breaks that lock by being structurally honest about where intelligence is needed. Anthropic’s own benchmarks back this up concretely. On SWE-bench Multilingual — 300 problems across 9 languages, averaged over 5 trials — Sonnet 4.6 with an Opus advisor outperformed Sonnet 4.6 solo by 2.7 percentage points while costing 11.9% less per agentic task. The Haiku numbers are even more dramatic: Haiku with an Opus advisor hit 41.2% on BrowseComp versus 19.7% solo — more than doubled — at 85% lower cost than running Sonnet solo on the same task.

Those are not marginal improvements. The Haiku result in particular suggests a pattern where a genuinely weak executor, augmented by occasional Opus consultations, can outperform a mid-tier model running alone at a fraction of the cost. For high-volume pipelines processing thousands of agentic tasks per day, the difference between “Opus on everything” and “Haiku with Opus on hard turns” is the kind of number that finance departments actually notice.

The architectural contrast matters too. Traditional orchestrator-delegate systems scale cost with the orchestrator model for every turn, regardless of decision complexity. The advisor pattern scales cost sublinearly with problem difficulty — you only pay Opus rates on the turns that need it. This is not a theoretical distinction. It changes capacity planning for any team running production agents.

Three production traps that will bite you if you skip testing: (1) max_tokens caps executor output only — advisor tokens are unbounded by this parameter and billed separately at Opus rates in the usage block, meaning a runaway advisor consultation has no built-in ceiling. (2) Advisor responses do not stream — the complete block arrives at once, creating a perceptible pause in long-running agent loops that breaks real-time interaction patterns. (3) Priority Tier does not cascade — if you have purchased low-latency Priority Tier for Sonnet, Opus advisor calls still use standard throughput unless you buy Opus Priority separately. On latency-sensitive workloads, this can defeat the cost savings entirely.

The comparison to Anthropic’s Managed Agents is worth drawing explicitly. Managed Agents gives you infrastructure — sandboxing, state, session handling — at $0.08 per session-hour. The Advisor Tool gives you intelligence routing at the API level. They solve different problems but compound when used together: a Managed Agent session running Sonnet as executor with Opus advisory could deliver both infrastructure convenience and cost-optimized reasoning. If you are evaluating Anthropic’s agent platform holistically, test both, not one or the other.

The Take

I would implement this against every agentic pipeline currently running at Opus today. The integration cost is a single header and one tool definition — you can have a benchmark running in under an hour. The potential savings on high-volume workflows are significant enough to justify the time even if you only shift 30% of your Opus traffic to Sonnet-with-advisor.

But I would not ship it to production without running your own eval suite first. Anthropic is transparent about this: results are task-dependent. If your workload genuinely requires Opus-level reasoning on every turn — some do — the advisor pattern just adds latency for no benefit. The sweet spot is workloads with variable reasoning requirements, where most turns are mechanical and a minority are genuinely hard. That describes most coding agents, most multi-turn tool-use pipelines, and most data processing workflows I have seen.

The deeper signal here is not about one API feature. It is about the cost curve of agentic AI becoming a design variable rather than a fixed expense. The teams that figure out where intelligence is actually needed in their pipelines — and route accordingly — will outspend their competitors by less while matching or exceeding quality. The Advisor Tool is the first production-grade implementation of that principle from a frontier lab. Whether you use it directly or replicate the pattern with your own routing logic, the insight is the same: paying for intelligence you do not need on a given turn is waste, and eliminating that waste is now a one-header problem.

Run three configurations against your existing eval suite before committing: Sonnet solo, Sonnet + Opus advisor, and Opus solo. Compare cost and quality across all three. The advisor pattern wins most when your task distribution is skewed — lots of easy turns with occasional hard ones. If your distribution is flat and every turn is hard, stick with Opus end-to-end.

Claude Opus 4.7 — The Hidden Cost of an Upgrade — what changed in Opus pricing when 4.7 shipped, and why the advisor model version matters
Agent Pipeline Cost Optimization — practical strategies for reducing agentic API spend without sacrificing output quality
Anthropic Managed Agents — Runtime Lock-In — the infrastructure layer that compounds with the Advisor Tool for full-stack Anthropic agent deployment

By dennis · Apr 9, 2026 ← all signals

What Happened — The Inverted Orchestration Model

Why This Matters

The Take

Related