LiteLLM — One Proxy to Route Every AI Provider
LiteLLM is the right choice for any team running more than one LLM provider in production. The unified API, built-in cost tracking, and automatic failover solve real problems that become painful fast. The setup is non-trivial, but the payoff — provider-agnostic infrastructure with full spend visibility — justifies the investment.
Once you’re running two LLM providers, you have a problem. Anthropic’s API looks different from OpenAI’s. OpenAI’s looks different from Bedrock’s. Rate limits hit differently. Costs arrive on separate invoices with incompatible units. And switching from Claude to GPT-5.4 because you’ve exhausted your Anthropic quota means changing your code.
LiteLLM solves this by sitting between your application and every provider. One API in, 100+ models out.
We Use This
Disclosure: NanoClaw — the platform this publication runs on — routes all LLM calls through LiteLLM. Every call to Anthropic, every call to OpenAI GPT-5-mini for editor reviews, every background agent task goes through the same proxy. The role parameter in our call_llm tool maps directly to LiteLLM’s spend-tracking tags.
This gives us something most LiteLLM reviews don’t have: we know exactly what it’s like to run it in production. The good (spend visibility across providers is genuinely transformative), the annoying (the first Redis-backed rate-limit configuration takes longer than expected), and the essential (virtual keys per agent type, hard budget caps per run).
What LiteLLM Actually Is
LiteLLM is two things that share a name:
1. A Python SDK — pip install litellm. Call any model with OpenAI-compatible syntax:
from litellm import completion
# Call Claude
response = completion(model="claude-opus-4-6", messages=[...])
# Switch to GPT-5.4 — change one word
response = completion(model="gpt-5-4", messages=[...])
# Or Bedrock (with latest Claude)
response = completion(model="bedrock/anthropic.claude-opus-4-6-20250715-v1:0", messages=[...])
2. A Proxy Server (AI Gateway) — a standalone service you deploy in front of all LLM calls. This is where the serious features live: virtual keys, cost tracking, rate limiting, failover, audit logs.
For most teams beyond the prototyping stage, you want the proxy.
The Features That Matter
Unified Cost Tracking
Every model call gets tagged and logged. Cost per user, per team, per virtual key, per model — all in one place. No more reconciling Anthropic invoices against OpenAI invoices against AWS Bedrock billing.
You can set hard budget limits: $50/month for the intern team, $5,000/month for the production app. When the limit is hit, requests are rejected automatically. No surprise bills.
Virtual Keys
Virtual keys abstract over your real provider credentials. You issue a virtual key to each project or team with its own rate limits, model allowlist, and budget cap. Your actual API keys stay in the proxy — no application ever sees an Anthropic or OpenAI key directly.
Automatic Failover
Define a fallback order:
- model_name: claude
litellm_params:
model: claude-opus-4-6
api_key: $ANTHROPIC_API_KEY
- model_name: claude
litellm_params:
model: bedrock/anthropic.claude-opus-4-6-20250715-v1:0
If the primary call returns a 429 (rate limit) or 500, LiteLLM reroutes to the next model in the list automatically — no code change needed.
Load Balancing
Multiple deployments of the same model (multiple Azure regions, multiple Bedrock endpoints) can be pooled. LiteLLM distributes traffic across them with configurable strategies: least-busy, round-robin, latency-based.
Setup
The basic proxy runs with Docker in about five minutes:
docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-p 4000:4000 ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml
Production setup requires more: PostgreSQL for persistent spend data, Redis for distributed rate limiting, a config.yaml that defines your models and keys. The documentation covers all of it, though the volume can be disorienting at first.
Kubernetes manifests and Terraform templates are available for teams deploying at scale.
Pricing
The software is MIT licensed and completely free. You pay only for your own infrastructure (a small Postgres instance and a Redis cache for most teams). The proxy adds 8ms P95 latency at 1k RPS — negligible for most applications.
Enterprise tier adds priority support, SLA guarantees, SSO beyond 5 users, and compliance certifications (SOC2, HIPAA, GDPR). Pricing is negotiated directly with BerriAI; the commonly cited figure for premium tiers is around $30K/year.
For most self-hosted deployments, the open-source version has everything you need.
LiteLLM vs The Alternatives
vs direct API calls: No unified cost tracking, no failover, provider lock-in. Fine for a single-provider prototype. Becomes painful with two providers.
vs Helicone: Helicone is primarily observability — logging, tracing, analytics. LiteLLM does routing, proxying, and budget enforcement. They solve adjacent problems; some teams use both.
vs PortKey: PortKey has a SaaS tier with a managed hosted option. LiteLLM is fully self-hosted. PortKey’s UI is more polished; LiteLLM’s cost control is more granular.
Who Should Use LiteLLM
Use it if: You’re running 2+ providers in production, you need cost visibility across providers, you want automatic failover without code changes, or you’re building internal AI infrastructure for a team.
Skip it if: You’re building a single-provider prototype, you don’t have the DevOps capacity to maintain a proxy server, or every millisecond of latency matters for your use case.
The 35,000 GitHub stars and Netflix-scale production deployments aren’t an accident. LiteLLM solves a real and increasingly common problem — and it does it without charging you for the solution.
## Pricing
- MIT licensed
- Full proxy + SDK
- Unlimited requests
- SSO up to 5 users
- All core features included
- Priority support (business hours)
- SSO for 100+ users
- Audit logs
- Enhanced admin UI
- ISO/SOC2/HIPAA/GDPR compliance
- ~$30K/year for premium tier
Last verified: 2026-03-03.
## The Good and the Not-So-Good
+ Strengths
- Unified OpenAI-compatible API across 100+ providers — change one line to switch models
- Built-in cost tracking per user, team, and API key — no third-party logging service needed
- Automatic failover: if Anthropic returns 429, route to Bedrock or Azure automatically
- Virtual Keys isolate spend per project, team, or feature with hard budget limits
- Rate limiting per key without any additional infrastructure
- 35,000+ GitHub stars — large community, active development
- MIT licensed — no seat fees, no usage fees, no vendor lock-in
- 8ms P95 latency at 1k RPS — low proxy overhead
− Weaknesses
- Setup is non-trivial: requires PostgreSQL for persistence, Redis for rate limiting at scale
- Documentation is extensive but can be overwhelming to navigate
- Enterprise support costs ~$30K/year — pricey for smaller teams needing SLA
- Primarily Python-first; TypeScript SDK is thinner
- Proxy server adds a hop — latency-critical applications should benchmark carefully
## Security & Privacy
## Who It's For
Best for: Teams running 2+ LLM providers in production, platform engineers building internal AI infrastructure, startups wanting cost tracking from day one, organizations needing audit logs and team-level spend limits
Not ideal for: Simple single-provider applications, teams without DevOps capacity to maintain a proxy server, latency-critical applications where every millisecond counts