Code with Claude — Don't Update Your Model String Tomorrow
Anthropic's Code with Claude hits SF May 6. Claude Jupiter is in active red teaming — the same pattern that preceded Claude 4. Here's what to watch.
Anthropic is running red-teaming sessions on a model codenamed “Jupiter-v1-p” right now, days before its Code with Claude developer conference in San Francisco on May 6. This is the exact same pre-announcement choreography that preceded the Claude 4 family reveal at last year’s Code with Claude event — planet codename, safety probe, then keynote drop. If you’re running Claude in production pipelines, tomorrow morning is when you find out whether your cost model still holds.
TL;DR
- What: Anthropic’s Code with Claude conference hits SF May 6; a model codenamed Jupiter is in active red teaming as of May 1
- Pattern: Planet-codename red teaming → conference keynote → model launch. Neptune preceded Claude 4 in May 2025. Jupiter follows the identical playbook
- Risk: If Jupiter ships a new model branch, expect tokenizer or pricing changes that break existing automation assumptions — Opus 4.7 already caught teams off-guard with a 35% token inflation
- Action: Audit your Claude API pipelines today, not after the keynote
Code with Claude Tomorrow — What Happened
TestingCatalog confirmed on May 1 that Anthropic began internal red teaming of “Jupiter-v1-p” ahead of the May 6 event. Anthropic uses planet codenames for pre-release safety probes to keep the actual product designation hidden until launch — the same convention that tagged the Claude 4 family as “Neptune” weeks before its public announcement in May 2025.
The official Code with Claude agenda explicitly lists sessions on “what today’s models can do and where they’re headed, from the Anthropic researchers and engineers building them.” That’s the same session format that preceded the Claude 4 reveal. You don’t book your research team for a keynote to talk about what already shipped.
What remains unclear is which model Jupiter becomes. The March 31 Claude Code source leak exposed references to unreleased model tiers — including Capybara/Mythos, positioned above Opus — but no source directly ties Jupiter to a specific public model number.
Reported but unconfirmed: Secondary sources claim the next Sonnet skips 4.7 and lands at 4.8, suggesting a different internal branch rather than a standard Opus-to-Sonnet port. This comes from leaked source references, not official Anthropic disclosure. Separately, Geeky Gadgets reported “Claude Cardinal” — allegedly a visual analytics feature for user activity and memory usage within Claude Code — as another conference announcement. No official Anthropic source confirms either claim. Treat both as signal, not fact.
Why This Matters
The pattern here is more important than the codename. Anthropic has established Code with Claude as its model launch vehicle. Last year, Neptune → Claude 4. This year, Jupiter → something. The question isn’t whether a new model ships. It’s what breaks when it does.
Here’s the concrete risk: Claude Opus 4.7 launched on April 16 with a new tokenizer that produces up to 35% more tokens for identical input compared to previous models. Teams running automated pipelines — CI/CD generation, code review, batch processing — only discovered this in production when their bills spiked. Anthropic documented the range as 1.0× to 1.35× in their platform docs, but the announcement didn’t lead with that number. If Jupiter ships a model from a different internal branch, the tokenizer behavior could diverge again, and you won’t know until your invoices land.
The Opus 4.7 tokenizer change produced up to 35% more tokens for identical input. Any new model from a different branch should be tested against your existing prompts before you update API calls. Run a cost comparison on 100 representative requests before switching
modelstrings in production.
The benchmark picture matters too, but only if you read it carefully. Claude Opus 4.7 holds 87.6% on SWE-bench Verified and 64.3% on SWE-bench Pro. These are different benchmarks with different difficulty levels and different memorization screening methodologies — Anthropic updated Opus 4.7’s scores twice after launch as screening improved. Any Jupiter announcement needs to specify which benchmark, which harness version, and whether memorization screens were applied. If the keynote shows a single number on a slide without that context, discount it.
The broader competitive context adds urgency. GPT-5.5 leads on standard SWE-bench at 82.6%, while Opus 4.7 holds SWE-bench Verified at 87.6%. These aren’t directly comparable, which is exactly the point — model vendors cherry-pick the benchmark that flatters their numbers. If Jupiter’s announcement follows the same playbook, the headline number will tell you less than the methodology footnote.
What makes this announcement cycle different from routine model updates is the convergence of signals from the March source leak. That leak revealed KAIROS (always-on persistent agents), ULTRAPLAN (30-minute cloud planning runtime), and Capybara/Mythos (a tier above Opus). If Jupiter is the public name for any of these capabilities, it’s not a point release — it’s a platform shift. And platform shifts carry pricing changes that cascade through every integration.
Before the keynote: document your current
modelstring, average tokens per request, and monthly spend. After the announcement: run the same prompt set against the new model and compare. This takes 30 minutes and can save you from a billing surprise that takes weeks to diagnose.
The Take
Anthropic has turned Code with Claude into the model drop event that Google I/O and OpenAI Dev Day wish they were — focused, developer-targeted, and tied to actual shipping product. The Jupiter red-teaming signal is as close to a confirmed pre-announcement as you get without an Anthropic blog post.
But here’s what I’m watching more than the model name: tokenizer behavior and pricing tier. If Jupiter ships a mid-tier model with meaningfully different tokenization than Sonnet 4.6, every automated pipeline using claude-sonnet-4-6 needs an immediate cost audit. We saw this exact failure mode with Opus 4.7 — teams updated their model string, didn’t test token counts, and ate a 20-35% cost increase on identical workloads. The model capability improved, but the bill improved faster.
My recommendation is blunt: don’t update your model string tomorrow. Watch the keynote, read the pricing page (not the blog post — the pricing page), run your prompt regression suite against the new model, and compare token counts before you touch production. The teams that got burned on Opus 4.7 were the ones who updated within hours of the announcement. The teams that didn’t get burned were the ones who waited 48 hours and ran the numbers.
If you’re on Claude Max at $100-200/month, the subscription absorbs tokenizer changes. If you’re on API billing, you’re exposed. Know which one you are before 10 AM Pacific tomorrow.