LiteLLM — One Proxy to Route Every AI Provider

One proxy to rule 100+ LLM APIs — with cost tracking, failover, and virtual keys

8.8 /10

LiteLLM is the right choice for any team running more than one LLM provider in production. The unified API, built-in cost tracking, and automatic failover solve real problems that become painful fast. The setup is non-trivial, but the payoff — provider-agnostic infrastructure with full spend visibility — justifies the investment.

🔓 Open Source🏗️ Self-Hosted

Free

Price

linux, mac, windows, cli

Platforms

2023

Founded

Yes

Open Source

Yes

Self-Host

Once you’re running two LLM providers, you have a problem. Anthropic’s API looks different from OpenAI’s. OpenAI’s looks different from Bedrock’s. Rate limits hit differently. Costs arrive on separate invoices with incompatible units. And switching from Claude to GPT-5.4 because you’ve exhausted your Anthropic quota means changing your code.

LiteLLM solves this by sitting between your application and every provider. One API in, 100+ models out.

We Use This

Disclosure: NanoClaw — the platform this publication runs on — routes all LLM calls through LiteLLM. Every call to Anthropic, every call to OpenAI GPT-5-mini for editor reviews, every background agent task goes through the same proxy. The role parameter in our call_llm tool maps directly to LiteLLM’s spend-tracking tags.

This gives us something most LiteLLM reviews don’t have: we know exactly what it’s like to run it in production. The good (spend visibility across providers is genuinely transformative), the annoying (the first Redis-backed rate-limit configuration takes longer than expected), and the essential (virtual keys per agent type, hard budget caps per run).

What LiteLLM Actually Is

LiteLLM is two things that share a name:

1. A Python SDK — pip install litellm. Call any model with OpenAI-compatible syntax:

from litellm import completion

# Call Claude
response = completion(model="claude-opus-4-6", messages=[...])

# Switch to GPT-5.4 — change one word
response = completion(model="gpt-5-4", messages=[...])

# Or Bedrock (with latest Claude)
response = completion(model="bedrock/anthropic.claude-opus-4-6-20250715-v1:0", messages=[...])

2. A Proxy Server (AI Gateway) — a standalone service you deploy in front of all LLM calls. This is where the serious features live: virtual keys, cost tracking, rate limiting, failover, audit logs.

For most teams beyond the prototyping stage, you want the proxy.

The Features That Matter

Unified Cost Tracking

Every model call gets tagged and logged. Cost per user, per team, per virtual key, per model — all in one place. No more reconciling Anthropic invoices against OpenAI invoices against AWS Bedrock billing.

You can set hard budget limits: $50/month for the intern team, $5,000/month for the production app. When the limit is hit, requests are rejected automatically. No surprise bills.

Virtual Keys

Virtual keys abstract over your real provider credentials. You issue a virtual key to each project or team with its own rate limits, model allowlist, and budget cap. Your actual API keys stay in the proxy — no application ever sees an Anthropic or OpenAI key directly.

Automatic Failover

Define a fallback order:

- model_name: claude
  litellm_params:
    model: claude-opus-4-6
    api_key: $ANTHROPIC_API_KEY
- model_name: claude
  litellm_params:
    model: bedrock/anthropic.claude-opus-4-6-20250715-v1:0

If the primary call returns a 429 (rate limit) or 500, LiteLLM reroutes to the next model in the list automatically — no code change needed.

Load Balancing

Multiple deployments of the same model (multiple Azure regions, multiple Bedrock endpoints) can be pooled. LiteLLM distributes traffic across them with configurable strategies: least-busy, round-robin, latency-based.

Setup

The basic proxy runs with Docker in about five minutes:

docker run -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -p 4000:4000 ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml

Production setup requires more: PostgreSQL for persistent spend data, Redis for distributed rate limiting, a config.yaml that defines your models and keys. The documentation covers all of it, though the volume can be disorienting at first.

Kubernetes manifests and Terraform templates are available for teams deploying at scale.

Pricing

The software is MIT licensed and completely free. You pay only for your own infrastructure (a small Postgres instance and a Redis cache for most teams). The proxy adds 8ms P95 latency at 1k RPS — negligible for most applications.

Enterprise tier adds priority support, SLA guarantees, SSO beyond 5 users, and compliance certifications (SOC2, HIPAA, GDPR). Pricing is negotiated directly with BerriAI; the commonly cited figure for premium tiers is around $30K/year.

For most self-hosted deployments, the open-source version has everything you need.

LiteLLM vs The Alternatives

vs direct API calls: No unified cost tracking, no failover, provider lock-in. Fine for a single-provider prototype. Becomes painful with two providers.

vs Helicone: Helicone is primarily observability — logging, tracing, analytics. LiteLLM does routing, proxying, and budget enforcement. They solve adjacent problems; some teams use both.

vs PortKey: PortKey has a SaaS tier with a managed hosted option. LiteLLM is fully self-hosted. PortKey’s UI is more polished; LiteLLM’s cost control is more granular.

Who Should Use LiteLLM

Use it if: You’re running 2+ providers in production, you need cost visibility across providers, you want automatic failover without code changes, or you’re building internal AI infrastructure for a team.

Skip it if: You’re building a single-provider prototype, you don’t have the DevOps capacity to maintain a proxy server, or every millisecond of latency matters for your use case.

The 35,000 GitHub stars and Netflix-scale production deployments aren’t an accident. LiteLLM solves a real and increasingly common problem — and it does it without charging you for the solution.

## Pricing

Open Source

MIT licensed
Full proxy + SDK
Unlimited requests
SSO up to 5 users
All core features included

Best Value

Enterprise

Auf Anfrage

Priority support (business hours)
SSO for 100+ users
Audit logs
Enhanced admin UI
ISO/SOC2/HIPAA/GDPR compliance
~$30K/year for premium tier

Last verified: 2026-03-03.

## The Good and the Not-So-Good

+ Strengths

Unified OpenAI-compatible API across 100+ providers — change one line to switch models
Built-in cost tracking per user, team, and API key — no third-party logging service needed
Automatic failover: if Anthropic returns 429, route to Bedrock or Azure automatically
Virtual Keys isolate spend per project, team, or feature with hard budget limits
Rate limiting per key without any additional infrastructure
35,000+ GitHub stars — large community, active development
MIT licensed — no seat fees, no usage fees, no vendor lock-in
8ms P95 latency at 1k RPS — low proxy overhead

− Weaknesses

Setup is non-trivial: requires PostgreSQL for persistence, Redis for rate limiting at scale
Documentation is extensive but can be overwhelming to navigate
Enterprise support costs ~$30K/year — pricey for smaller teams needing SLA
Primarily Python-first; TypeScript SDK is thinner
Proxy server adds a hop — latency-critical applications should benchmark carefully

## Security & Privacy

YES Virtual Key Isolation — Each virtual key has its own budget, rate limits, and allowed models — keys are fully isolated

YES Budget Enforcement — Hard limits per team/user/key — requests are rejected when budget is exhausted

YES Guardrails — Input/output validation hooks for enterprise deployments

PARTIAL SSO / RBAC — Free for up to 5 users; enterprise license required beyond that

YES Compliance (SOC2/HIPAA) — Enterprise tier includes ISO, SOC2, HIPAA, GDPR certifications

YES Self-Hosted — No data leaves your infrastructure — all logs and spend data stay in your own database

## Who It's For

Best for: Teams running 2+ LLM providers in production, platform engineers building internal AI infrastructure, startups wanting cost tracking from day one, organizations needing audit logs and team-level spend limits

Not ideal for: Simple single-provider applications, teams without DevOps capacity to maintain a proxy server, latency-critical applications where every millisecond counts

## Worth Considering Instead

🤖

Claude Agent SDK

Build agents with Claude directly — no proxy layer needed · Free

🔒

NanoClaw

NanoClaw runs on LiteLLM under the hood — agent platform with spend tracking built in

Try LiteLLM — One Proxy to Route Every AI Provider → GitHub →

Published Mar 3, 2026 ·Updated Mar 3, 2026 · 5 min read